Monday, 7 May 2018

Domains, IPs, Ports and Virtual Hosts - How it All Fits Together

Developing web applications might seem fairly simple and writing a basic web page is. If all you are doing is writing a static web page then you can freely go ahead and start writing code. What if you want to develop a dynamic web page or mimic a production server?

Servers

Any single computer on any network, whether a local network or the Internet, is associated with an address. An address may be assigned to only one computer at any given time but multiple addresses could potentially point to the same server. Without going into the semantics of addresses, two common ones that will be encountered are:
  • 192.168.X.X - These are local, intranet addresses. Each computer on a home network would have an address in this range.
  • 127.0.0.1 - This is a special address called the loop-back address. It is used by a server in order to allow it to contact itself. Hence, loop back.
An IP (Internet Protocol) address is a 32 bit number that is usually displayed as a series of numbers separated by dots, as seen above.

There are a lot of complexities to how addressing works, including internal (intranet) to external network addresses and how they interact.

Ports

To serve files or data, a program has to be setup to listen for new connections. However, if a computer may only have a single address then how does a server know how to associate any given connection with a program?

Ports allow connections routed to routed to the correct program. Ports for a few common web services include:
  • 20 and 21 - File Server, FTP (File Transfer Protocol)
  • 25 - Mail Server, SMTP (Simple Mail Transfer Protocol)
  • 80 - Web Server, HTTP (Hyper-Text Transfer Protocol)
  • 443 - Web Server, HTTPS (Hyper-Text Transfer Protocol over SSL/TLS)
A list of reserved ports can be found on Wikipedia.

Only one program may listen to any one port but for web servers, at least, there may exist a way to get around this limitation.

Also of note, is that evne though many clients/services don't explicitly require you to add a port it is always there. If no port is supplied, web browsers will always use port 80. When testing web applications from a local system, it is common to use port 8080 or 8880 like so: www.example.com:8080.

Domains

A domain is usually a human readable string of characters that act as an alias to an IP address. Any single domain may be associated with a single IP address. Domains on the Internet must be registered with an authority called a Domain Registrar. You supply the registrar with the address to your host and they will associate it with the name of your domain.

A top level domain (TLD) are those such as: com, gov, net and dev, to name just a few. There are many others. As long as they are used on your local machine and a client, like a web browser, never accesses a name server to resolve it (as it would for domains on the internet) then there are no real restrictions.

Unfortunately, some web browsers, Chrome and Firefox for example, can complicate matters and it is generally recommended to steer away from common TLDs.

Multiple domains may point to the same address.

Sub Domains

Any other domains are sub-domains. Without going into detail, when someone refers to a subdomain they usually refer to the prefix domain, namely www or mobile. Moving from right to left, the least significant domain is on the right.

  • www.example.com - root, com (top level domain), example, www (least significant domain)

Virtual Hosts

A web server can host many different websites. If a web server is listening on port 80 but multiple different domains point to the same address, how would the web server know which site to route the request to?

Virtual hosts associate domain names to locations on the server. Even though both www.example.com and mobile.example.com might resolve to 192.168.0.1:80 the web server is given the value of the domain. The web server takes that domain and, provided an entry for it exists in its configuration, routes communication for each domain to the correct web directory.

Hosts File

When a client like a web server is supplied a domain name, normally it contacts a service called DNS (Domain Name Service) to resolve the address the name is associated with. When doing local development, this isn't desirable as an internal web server is usually not publicly accessible.

To overcome this limitation, most operating systems have a file that is used to define local domain names. If a domain exists in this list, then the associated IP address is used without ever connecting to a DNS server to resolve the name.

In this capacity, you could even do things like route www.google.com back at your own web server.

On most GNU/Linux systems you'll find the hosts file under the /etc directory. You would add an entry like:

www.example.com 127.0.0.1

When putting that address into your browser, the system will first check for a matching entry in /etc/hosts and, if it find one, provide the given address. In this case, the loop back address which sends the connection to the corresponding port on your system. Remember, a port is always required even though they often don't need to be supplied explicitly.

Bringing It All Together

Armed with this knowledge, a developer can setup a testing environment that mimics a production server.

If developing web sites and are in need of setting up a full stack setup for each one (by that I mean, a web server, database server and any other accoutrements) then a process similar to this could be followed:

  1. Install/Copy any files from either an existing production or a framework.
  2. Add a VHost entry for your web server.
  3. Add a Domain entry to your hosts file.
  4. Create or Import a database.
  5. Work like crazy.
With such a setup, a developer can connect to a local web server as if it were a remote server.


Of course, there are a lot more things that can be done to provide even better environments. Here are a few closing thoughts to whet your appetite.

Docker

You can use Docker to mimic a production server almost exactly or, in some cases, run docker on both production and in development to ensure 100% compatibility. This can be very useful for addressing issues with software packages and libraries being incompatible versions.

Reverse Proxy

Some environments, like PHP for instance, may require using a technique called a reverse proxy. Go development can also benefit one in instances where you need both a Go application and a web server like nginx or Apache running, too. This involves having one server accept an incoming request then forwarding the request to the same address but on a different port.