Monday, 7 May 2018

Domains, IPs, Ports and Virtual Hosts - How it All Fits Together

Developing web applications might seem fairly simple and writing a basic web page is. If all you are doing is writing a static web page then you can freely go ahead and start writing code. What if you want to develop a dynamic web page or mimic a production server?


Any single computer on any network, whether a local network or the Internet, is associated with an address. An address may be assigned to only one computer at any given time but multiple addresses could potentially point to the same server. Without going into the semantics of addresses, two common ones that will be encountered are:
  • 192.168.X.X - These are local, intranet addresses. Each computer on a home network would have an address in this range.
  • - This is a special address called the loop-back address. It is used by a server in order to allow it to contact itself. Hence, loop back.
An IP (Internet Protocol) address is a 32 bit number that is usually displayed as a series of numbers separated by dots, as seen above.

There are a lot of complexities to how addressing works, including internal (intranet) to external network addresses and how they interact.


To serve files or data, a program has to be setup to listen for new connections. However, if a computer may only have a single address then how does a server know how to associate any given connection with a program?

Ports allow connections routed to routed to the correct program. Ports for a few common web services include:
  • 20 and 21 - File Server, FTP (File Transfer Protocol)
  • 25 - Mail Server, SMTP (Simple Mail Transfer Protocol)
  • 80 - Web Server, HTTP (Hyper-Text Transfer Protocol)
  • 443 - Web Server, HTTPS (Hyper-Text Transfer Protocol over SSL/TLS)
A list of reserved ports can be found on Wikipedia.

Only one program may listen to any one port but for web servers, at least, there may exist a way to get around this limitation.

Also of note, is that evne though many clients/services don't explicitly require you to add a port it is always there. If no port is supplied, web browsers will always use port 80. When testing web applications from a local system, it is common to use port 8080 or 8880 like so:


A domain is usually a human readable string of characters that act as an alias to an IP address. Any single domain may be associated with a single IP address. Domains on the Internet must be registered with an authority called a Domain Registrar. You supply the registrar with the address to your host and they will associate it with the name of your domain.

A top level domain (TLD) are those such as: com, gov, net and dev, to name just a few. There are many others. As long as they are used on your local machine and a client, like a web browser, never accesses a name server to resolve it (as it would for domains on the internet) then there are no real restrictions.

Unfortunately, some web browsers, Chrome and Firefox for example, can complicate matters and it is generally recommended to steer away from common TLDs.

Multiple domains may point to the same address.

Sub Domains

Any other domains are sub-domains. Without going into detail, when someone refers to a subdomain they usually refer to the prefix domain, namely www or mobile. Moving from right to left, the least significant domain is on the right.

  • - root, com (top level domain), example, www (least significant domain)

Virtual Hosts

A web server can host many different websites. If a web server is listening on port 80 but multiple different domains point to the same address, how would the web server know which site to route the request to?

Virtual hosts associate domain names to locations on the server. Even though both and might resolve to the web server is given the value of the domain. The web server takes that domain and, provided an entry for it exists in its configuration, routes communication for each domain to the correct web directory.

Hosts File

When a client like a web server is supplied a domain name, normally it contacts a service called DNS (Domain Name Service) to resolve the address the name is associated with. When doing local development, this isn't desirable as an internal web server is usually not publicly accessible.

To overcome this limitation, most operating systems have a file that is used to define local domain names. If a domain exists in this list, then the associated IP address is used without ever connecting to a DNS server to resolve the name.

In this capacity, you could even do things like route back at your own web server.

On most GNU/Linux systems you'll find the hosts file under the /etc directory. You would add an entry like:

When putting that address into your browser, the system will first check for a matching entry in /etc/hosts and, if it find one, provide the given address. In this case, the loop back address which sends the connection to the corresponding port on your system. Remember, a port is always required even though they often don't need to be supplied explicitly.

Bringing It All Together

Armed with this knowledge, a developer can setup a testing environment that mimics a production server.

If developing web sites and are in need of setting up a full stack setup for each one (by that I mean, a web server, database server and any other accoutrements) then a process similar to this could be followed:

  1. Install/Copy any files from either an existing production or a framework.
  2. Add a VHost entry for your web server.
  3. Add a Domain entry to your hosts file.
  4. Create or Import a database.
  5. Work like crazy.
With such a setup, a developer can connect to a local web server as if it were a remote server.

Of course, there are a lot more things that can be done to provide even better environments. Here are a few closing thoughts to whet your appetite.


You can use Docker to mimic a production server almost exactly or, in some cases, run docker on both production and in development to ensure 100% compatibility. This can be very useful for addressing issues with software packages and libraries being incompatible versions.

Reverse Proxy

Some environments, like PHP for instance, may require using a technique called a reverse proxy. Go development can also benefit one in instances where you need both a Go application and a web server like nginx or Apache running, too. This involves having one server accept an incoming request then forwarding the request to the same address but on a different port.

Sunday, 23 April 2017

Been a While

It's been a long while since my last post. That's not to say I have been idle. In fact, it's quite the opposite.

After working 10 years first as a sales representative and another 10 years as a credit professional I decided it was time to finally pursue my true passion: programming. I have been programming since I was in my early teens and it has always been a passion of mine.

At one point I thought maybe I'd like to write games but I soon became disillusioned by the video game industry. I became confused as what I wanted to do and one thing leading to another, my career path headed in another direction.

I was quickly approaching a time in my life where either I commit to my path or pursue my passions. It was time to make a choice. However, being married with kids, making a life change comes with a tremendous amount of pressure and fear. If I fail or make the wrong choice, it's not just me who suffers the consequences.

The opportunity that presented itself also required us moving roughly 300km away. We had been wanting to move back to the Okanagan Valley for quite some time and now we could do it. My children are still young enough that lasting friendships haven't formed yet but still it would be hard on them.

So, I accepted a position at Acro Media where we primarily build custom Drupal e-commerce websites. I, specifically, work on supporting existing sites and maintaining our in-house infrastructure. Luckily, this means I don't exclusively program in PHP, Javascript, HTML and CSS. I also get to work in C++ and Go.

I wasn't sure how I would feel about working in PHP for much of my day. I am not a fan of the language for many reasons but I've been pleasantly surprised to discover that they joy of programming has thus-far eclipsed working in a language I dislike. I think this is mainly due to the fact that my day is quite varied. My job is about solving problems. It's not so much what language I'm using.

So where does that leave the blog?

Well, I hope that it means that'll be writing more again. Compilers are still a passion and I'm working on another project. It's rather ambitious so we'll see what happens there.

On that note, I think I'll call it a day. Until next time!

Sunday, 21 February 2016

The Accumulator Register

At all curious about the accumulator register? I was. Here's a little summation of what I discovered.


In modern, general purpose CPUs, the accumulator register has lost some of its meaning and much of its use has been relegated to convention rather than necessity.

I got to wondering why the ax/eax/rax register was called the accumulator and why it was used so much since it seemed that any of the general purpose registers would do. Does it just come down to convention or is there more to the story?

All the MIPS documentation I’ve read to date indicates the only register considered the accumulator is a hidden, double-word register accessed via the MFHI and MFLO instructions. This speical register is used when doing multiplication and division.

In developing my own virtual machine, and delving into electrical engineering, I got to be more intimately familiar with this happy little register.

A Little History

If my sources are correct, the first commercially available CPU from Intel was the 4004. It had a single register that was implictly used in the vast majority of its instructions. You can see for yourself on the data sheet (PDF). Can you guess which one?

The 4004 was designed in such a way that the output of the ALU fed directly into this single register to accumulate data. Operations that expect two operands always implicitly use the accumulator as the first operand.

The ADD instruction, for example, takes a single register as an operand. It performs addition on the value stored in the accumulator plus the given register and then stores the result back into the accumulator.

To my knowledge, the 4004 did not pioneer the use of Accumulator but it serves as a nice example.


The design seems simple but consider how calculators or adding machines work. How about an example?

Assume the calculator has just been turned on or a clear instruction has been set so that the accumulator has a value of zero.

The user types in a value then hits the addition key. The accumulator and the value on screen get added together. There’s no operands just a simple ADD instruction. ACC += DISPLAY.

Arithmetic instructions would not need explicit operands at this level of design. It makes the accumulator a very important facet in the design of these machines.

In modern implementations, even if a CPU can use any general purpose register to execute a particular instruction, it may be optimized to perform certain operations in fewer clock cycles when using the accumulator. You’ll need to read documentation to find out.


The result is that the importance of this register is historic and thereby remains important to standard convention. We probably no longer need to do it this way but it does have a logical, and historic, significance.

Happy programming!

Saturday, 14 November 2015

Compiler 3 Part 8 - Finale

This will end up being the longest, and final, post in the series. So let's tuck in!

Testing and Everything Else

The scanner and most of the infrastructure stayed exactly as it was.

The parser and AST did see some changes. The parser, in particular, saw several bug fixes and simplification. Some of the error checking that had been done during parsing has been passed on to the semantic analyzer. Similarly, anything related to type checking could be removed.

Also, the AST-related scoping code could be simplified and confined to the parser since it is discarded during semantic analysis.

Detecting “true” and “false” was added for boolean the type.

With those changes out of the way, I can say that if there is one area that Calc has truly lacked, it was in testing.

A compiler project should, in my opinion at least, be concerned with correctness. Correctness and consistency. The IR was one step in shoring up Calc in this area and testing is the other. Calc has always incorporated at least a little testing but it has never been as comprehensive as it should be.

Go provides great testing facilities right out of the box. The cover tool has also proven helpful in sussing out corner cases that need testing.

I have, I think, made large strides have been made in making Calc a much more stable compiler.

That isn’t to say that it’s perfect. I’m sure within 5 minutes of playing with it someone will discover some bugs. Hopefully, those than find them will take the time to submit an issue so I can create tests and fix them.

This is the value of having an open source project.

The Future...

Calc has continued to evolve along with my own skills and I hope that I’ve helped someone, in some way, to further their own growth. Whether it be teaching someone how to, or how NOT to code, the end result remains the same. Someone, somewhere, has benefitted from my efforts.

Now that I’ve had a chance to regroup from the herculean effort of writing and implementing the Calc2 spec and teaching series, I feel like I can continue to produce more content.

I do hope to continue talking about compiler construction. In the future, I hope that by using a smaller format, like this one, that I can produce more posts more quickly. I can certainly say that this third series has been much more enjoyable to write!

I have also been working on another side project to create a virtual machine with a complete software stack. It has an assembler, linker and virtual machine running it’s own bytecode. If there is interest, I’ll write some blog posts about it. You can find the project here.

I have concluded both previous series talking about future plans and I shall do no less today. In series two I had a list of things I wanted to implement and change. Let’s go through the list:

  • growable stack - well, this is null and void. Calc no longer manages it’s own stack.
  • simple garbage collector - I no longer plan on implementing a garbage collector myself. See below.
  • loops - not implemented
  • generated code optimizations - done! 
  • library/packages - not implemented
  • assembly - null and void. I don’t plan on generating assembly.
  • objects/structures - not implemented
Only one of the seven features I wanted to implement was actually completed. Of the remaining six features, three of them no longer make sense. The groundwork for other three has been laid down. In particular, loops and structures only await language design since implementation should be almost trivial.

Incorporating the remaining three features, I now present you with an updated list:

  • #line directives for source-debug mapping - I think this is a very reasonable and easily obtainable goal. It will make debugging Calc with the GNU debugger easier.
  • logical && and || operators - are easily added with minimal work
  • importing libraries - thanks to the x/debug/elf package, importing libraries into code may not be unreasonable to achieve in the near future. The scope of adding imports is likely a series in of itself.
  • garbage collection - Calc 2.x does not need garbage collection. It does not have pointers and it does no memory allocation. However, when it does, I will probably use the Beohm garbage collector rather than attempting to spin my own. Additional info can be found on Wikipedia. This feature will be far in the future so don't expect it any time soon.
  • structs, loops and arrays - with the new code generator, implementing structs and arrays on the back end ought to be trivial. Work on this has actually already begun and you can view some of the changes here. Sorry, the specification is not published publicly right now.
  • stack tracing - there isn’t much nothing holding me back from implementing stack traces on crashes now that the new code generator is in place. Time and effort are all that remains.
And that, as they say, is that! Thank you for taking the time to read through this series! At some point I’d like to put everything into a PDF but that would be a large task since large parts of both series would need to be re-written entirely. In fact, I might even want to start again from scratch.

Until next time!

Friday, 13 November 2015

Compiler 3 Part 7 - Code Generation

Here we are at the final step.

As mentioned in the introduction, I cut out a massive amount of code by offloading much of the work on the IR.

So here is what has changed:

I introduced a small function to map Calc types to C types. This could probably exist in ir.Types, too, but I chose to keep it coupled with the code generator since that’s the only code that uses it.

Any object with an ID has its original named stripped and is replaced by a generic variable name. These names start with an underscore and the lower-case letter “v” followed by the ID of the object.

Each binary operation is assigned to a new variable (a reason for why C99 is required). Don’t worry about this being wasteful. Even if you chose to output a chain of infix arithmetic (1 + 2 + … + N) the underlying assembly instructions usually take no more than two operands anyway. This is why using the above method works so well since it more closely matches the machine code.

Note: I've spent a lot of time comparing the assembly generated from C to see how things work. This is a fun (am I sick?) and interesting exercise to try yourself.
Even if you didn't follow along with the last series, I encourage you to view the previous binary code generation function. 58 lines of confusing mess now pared down to a simple 5 line function. Perhaps more importantly, the generated code is easier to read and follow even though it’s not really intended for visual parsing.

Another important change is using C-style function calling and function declarations. This was made possible by the new IR and type system. With every object being assigned a valid type, we can easily create proper C function prototypes and definitions.

By utilizing the SSA-like code generation and the new IR it also becomes trivial to use C-style calling convention. Types have already been checked, the number of arguments verified, and sub-expressions have been assigned ID’s. Therefore, only raw values and object IDs are passed into the function.

All in all, fairly simple.

Tuesday, 10 November 2015

Compiler 3 Part 6 - Tagging

This step is crucial to Calc’s code generator but may not exist at all in other compilers. Regardless, it makes code generation for Calc dead simple. Before I can get into the process, you do need a little background information.

In the introduction I mentioned something about 3AC and SSA. Make sure you check out the articles but the cut and dry is thus:

In SSA, each calculation is given a unique ID. These ID’s replace variable names and other objects.

So, what does this mean to us?

Consider the following two example infix arithmetic operations:

  1. a + b + c + d 
  2. a * (b + c) / d 

These two operations could be translated into something like the following:

Example 1: a + b + c + d
r1 := a + b
r2: := r1 + c
r3 := r2 + d

Example 1: a * (b + c) / d
r1 := b + c
r2 := a * r1
r3 := r2 / d

As you can see, the result of each binary operation is assigned to a new, unique variable. While verbose, it is much more akin to assembly and easy for C compilers to optimize.

It also ensures that calculations are done in the correct order and removes the necessity of pushing and popping operands to the stack.

Armed with knowledge, we can now get on with what I call tagging.

Tagging is the process by which we attach these unique identifiers to each operation. Variables, parameters, binary operations and unary operations all need to be tagged.

Oddly, perhaps, even if statements need to get tagged and you may wonder why that is. Well, if statements in Calc are not statements. They’re expressions. Like if expressions in functional languages, if expressions in Calc return a value.

As part of escape analysis and type checking, the value of any branch in an if expression must be checked and tagged since it’s result may be used elsewhere in the code.

As you can see, Folding constants can potentially save us time by reducing the number of operations needing an ID.

We can traverse the tree in any manner we chose provided that every ID is unique.

One more stop left to go. Code generation!

Monday, 9 November 2015

Compiler 3 Part 5 - Constant Folding

The only optimization included in series 2 was constant folding. Unlike in the previous series, we’re now much better equipped to handle this optimization.

I feel that I should point out that most C compilers do this step, too. I was somewhat reluctant to keep it in at first. However, my reasoning for keeping it is two-fold: one, I think learning a bit about optimizations on an IR is a worthy lesson; and two, it can help make the next step a little quicker.

Once again, the IR comes to the rescue! In the first step of transforming the AST into the IR we created constants. These were objects representing actual values. This crucial step makes it much easier on us now.

Unary and Binary operations are ideal candidates for folding. We have already done the work of verifying types and correctness so we can ignore all that now. When we check if a value is a certain type we can be confident that information is correct.

Looking at the code you can see that it’s pretty simple. Traverse the tree depth first.  If both operands of a binary object, or the single operand of a unary object, are values of the correct type we can fold them together. We then return the result as a new constant value and replace the previous object with it.

Moving back up the tree we repeat the process until we exhaust all the foldable values.

You can see the value in converting basic literals into constants when building the initial IR tree. It makes folding constants together much, much simpler.

Onward and upward!