Thursday, 6 September 2018

C++ Text Template Engine - Part 1: Overview

This post will be the first in a series of undetermined length in my journey to write a text template engine in C++. No doubt, something like this already exists but those I happened to look at didn't really wet my whistle. No only that, there are aspects of C++ I want to explore and this would give me an excellent opportunity to do just that.

Motivation

I would classify myself as an intermediate programmer overall and a somewhat novice to C++. Certainly, I am drastically behind the times of modern C++ and I was never a strong programmer in the language to begin with. So, the first motivation is to help improve my C++ chops a bit. Why?

Well, that leads me to my second motivation: Not long after I started working for my current employer, I was tasked with maintaining our in-house systems and they were authored almost 20 years ago. The main software functions much as a modern web-server except that it runs like a CGI script. Each page load invokes a program to build the page and output it over HTTP. Some of these pages are ludicrously complex and are all done through print (std::cout/std::ostream) statements. It's horrific to maintain.

As part of the effort to make the code base more maintainable I desire to have a template engine so I can create templates and inject data into them. Due to budgets and our main focus of development, completely replacing the system isn't currently in the cards but we are frequently making changes to it. So, in the meantime, I need to maintain it.

Recently, I created a stop-gap by using regex to perform search and replace on a template document. It has drastically improved the situation but falls far short of the mark. I'd like to further improve the system.

General Overview

So, the plan is to document the entire process. Including the mistakes. I want to give a sort of free form exploration as I meander through the process and hopefully you, the reader, learn something along the way.

The idea here is that I want to demonstrate a little about what agile programming looks like and demonstrate how the thought process works when scoping and building out a moderately sized piece of software.

I've already gone on long enough, so it's time to find a place to start.

User Stories

A good place would be to create user stories. These are just statements that describes what it is the user/client (in this case, me) wants. I used to think them quite silly and pointless but I've recently come to appreciate them.

They give you a starting point to begin a discussion and they help keep you on track. Too often I've found myself over-engineering or falling short of client expectations and it usually has been a result of not fully understanding their needs. That's where the user story comes in. It serves as a jumping off point to understanding what they want.

Story 1: As a developer, I would like to create a document, to act as a template, that can have portions of it replaced dynamically to create a final document. This would ease creating pages for my application by avoiding the need to construct these pages with print statements.

Story 2: As a developer, I will need to be able to use any basic data type as well as some standard library containers.

Story 3: As a developer, I need a method to perform conditionals and loops. I have lists of users and accounts I will need to loop over and print.

Story 4: As a front end developer, I want the syntax should be easy to learn and familiar to me. I don't want to spend a lot of time learning another

Story 5: As a developer, it would be useful that any classes I currently have in my code base be compatible or could be integrated easily into this system.

Initial Thoughts

Perfect. So, what can we glean from that?

I need to have a source document, maybe a file on disk or a string. Reading files was not part of the user story so I think I should dismiss that for now. All I care about is the actual source and either a data stream or string will suffice as that lets me take data from any source and allow me to transform it. I worry a little that a data stream might be a pain to work with so I'm thinking that I'll want the data source to be passed in as a string. That more or less covers story number one.

In the second story, I need to be able to access basic data types like an integer, float or character string. I also need to be concerned with more complex data structures like vectors or maps. Classes will be addressed in the last user story so I'll forget about that for now. Off the top of my head, I'm thinking I'll need to have some kind of base value type with several derived classes for each basic type. I'm thinking ahead a bit here so I'll leave it at that for the moment.

User story number three asks about a conditional. That could be something like an if statement or switch statement. I want to keep things as simple as possible, I will likely just use an if statement with and else clause. This story could use some fleshing out.

For loops, I feel am faced with the classic for and while statements. In the interest of simplicity, I will only pick one. Since I will likely be passing data structures into the template engine a for loop with an each or range behavior likely makes the most sense. In the Go standard library, they eschew the for and while keywords for the range keyword and I kind of like that but that may be problematic for user story four. Perhaps a keyword of foreach might be apropos.

A familiar syntax for front end developers would be one like Mustache, Django or Twig. I think if I adopt something similar, that will ease any mind-share. While some systems use different tokens for commands and identifiers I think I can safely use one and just enforce some simple naming rules common to almost every programming language.

Finally, I need to be able to support existing classes. To my knowledge, C++ does not have reflection and converting large classes of data into maps would not be fun or efficient. I'm thinking that I can maybe create an interface by which my existing classes could fulfill in order to make them compatible with this system. I'm going to have to think more on it but I need more information first.

The Stage is Set

That completes the initial breakdown and user stories. The next phase will be discovery and I'll cover in the next article.

Sunday, 19 August 2018

Writing Interfaces in C++

After my last article, I got to thinking about utilizing interfaces in C++. I am not an expert in C++ by any means and most of the code I have to work on is both antiquated (C++ 98) and poorly written (even for its time). Most of my time is spent writing PHP and Go, so using interfaces is quite common.

Interfaces, abstract classes in C++, are not used at all in the code bases I work on regularly. It got me to thinking: "could Go or PHP style interfaces be done in C++?"

Virtual Recap

A virtual function is one that is defined in a class that is intended to be redefined by derived classes. When a base class calls a virtual function, the derived version (if it exists) is called.

A key point to make is that virtual functions do NOT need to be defined by a derived class.

To see an example, check out my previous post.

Pure Virtual Goodness

In PHP, you would call these abstract functions. In C++, they are called pure virtual functions. Perhaps a better thing to call them would be purely virtual. I feel like that term makes it more clear to the purpose of this feature. Declare that a function exists as a method on a class but leave the definition for later. It exists purely in a virtual sense.

Abstract functions are those that are defined in a lower, base class and implemented by a higher, derived class. Class A provides a prototype for a function but class B, which extends class A, actually implements it.

A pure virtual function is a contract. A class which extends a class with an abstract, pure virtual function must implement it before it can be instantiated. This guarantees that the function will exist somewhere in the class hierarchy. Otherwise, it would be no different than calling a function that has not been defined.

A pure virtual function looks like this:
virtual void Read(std::string& s) = 0;
This declares a virtual function Read. The notation of assigning zero to the function designates that the function is purely virtual.

Abstract classes as Interfaces

A class with only pure virtual functions is considered to be an abstract class. Since C++ does not have interfaces in the same way other languages do, abstract classes can fulfill the same role.
class Reader {
public:
    virtual void Read(std::string& s) = 0;};
The above class is fully abstract. Any classes that extend it will need to implement the method Read.

Or does it?

Compound Interfaces

I am a proponent of small, simple interfaces that can be combined to create ever more complex ones. This is the Interface segregation principle from SOLID design principles. The problem lies in the fact that C++ requires derived classes implement pure virtual functions. Thankfully, there is a way to get around that restriction.

Extending a class with the virtual keyword tells the compiler that the derived class will not be implementing the pure virtual function either, making it abstract as well. This allows multiple interfaces to be combined.
class Reader {
public:
    virtual void Read(std::string& s) = 0;};
class Writer {
public:
    virtual void Write(const std::string& s) = 0;};
class ReadWriter : public virtual Reader, public virtual Writer {};
Here, the interfaces Reader and Writer get combined into a third abstract class ReadWriter. It too, could add further pure virtual functions if desired.

Implementing an Interface

Implementing the interface is the same as any deriving any class. So, to tie everything together, he's a complete example:
#include 
#include 

class Reader {
public:
    virtual void Read(std::string& s) = 0;};
class Writer {
public:
    virtual void Write(const std::string& s) = 0;};
class ReadWriter : public virtual Reader, public virtual Writer {};
class SomeClass: public ReadWriter {
    std::string buf;public:
    void Read(std::string& s) override { s = buf; }
    void Write(const std::string& s) override { buf = s; }
};
void readAndWrite(ReadWriter& rw) {
    rw.Write("Hello");    std::string buf;    rw.Read(buf);    std::cout << buf << std::endl;}

int main() {
    SomeClass c;    readAndWrite(c);    return 0;}
The on caveat is that abstract classes must be some kind of pointer, either as a standard pointer or a reference pointer. This requirement makes sense since it can not be instantiated.

Summary

Using pure virtual functions and virtual classes, it is indeed possible to describe behaviour as would be done in other languages with interfaces. Pure virtual functions have a further use in multiple inheritance. To learn more, check out this StackOverflow answer.

Happy programming!

Saturday, 18 August 2018

Understanding Virtual Functions in C++

Until I had to explain it to someone, I never appreciated how confusing the virtual keyword can be. After all, in certain situations there seems to be no functional difference between virtual and non-virtual functions except that your IDE or editor might complain at you.

This article is aimed at programmers who are new to C++ or an initiate to programming in general.

Virtual

The term conjures up a vision of something that is non-tangible. That is, something that doesn't exist in the physical world. In C++, this meaning is extended to describe functions which may not exist. By that, we mean that it may not be defined.

A virtual function, therefore, is a function which may not be real. While a class may call a function it defines, a virtual function could also exist somewhere else in it's object hierarchy. The class doesn't know what function it's actually calling until it is compiled or possibly until runtime.

Essentially, the virtual keyword signals to a developer that the function is intended to be able to be implemented or overridden in a derived class.

The Base Class (Non-Virtual)

To demonstrate, start by creating a simple base class with two public functions: Foo() and Bar(). Inside the Foo() function, call Bar(). It may be helpful to add some output so you know which function is being called.

Something like this:


class Base {
public:
    void Foo() { cout << "Base::Foo" << endl; Bar(); }
    void Bar() { cout << "Base::Bar" << endl; };};
In a main function, instantiate a new Base class and call Foo(). You should get ouput similar to this:
Base::Foo
Base::Bar
This is pretty standard fair and works as expected.

The Derived Class

Next, create a derived function that inherits from Base. In it, create a function with the same name and signature as Bar(). If printing out a statement, make sure you update it to report the new class name.
class Derived : public Base {
public:
    void Bar() { cout << "Derived::Bar" << endl; }
};
Call Foo() on this new class and you'll see that you get the same output. The function Bar() is only called on the Base class.

The Base Class (Virtual)

Add the virtual keyword to the function Bar() in the Base class then try running the code again. The code should look like this:
class Base {
public:
    void Foo() { cout << "Base::Foo" << endl; Bar(); }
    virtual void Bar() { cout << "Base::Bar" << endl; };};
You should get output like this:
Base::Foo
Derived::Bar
This time, Bar() was called in the Derived class instead of the Base class. Why?

Virtual Explained

The virtual keyword signaled a change in visibility. A non-virtual function tells the compiler not to bother looking for another function with the same name, just to look for the function in the class where it is called. A virtual function tells the compiler to see if a function with a matching signature exists in any derived classes first, before calling the one in the same class.

Summary

Virtual functions are those whom are intended to exist higher up in the class hierarchy in derived classes and should be called if they are defined.

Monday, 7 May 2018

Domains, IPs, Ports and Virtual Hosts - How it All Fits Together

Developing web applications might seem fairly simple and writing a basic web page is. If all you are doing is writing a static web page then you can freely go ahead and start writing code. What if you want to develop a dynamic web page or mimic a production server?

Servers

Any single computer on any network, whether a local network or the Internet, is associated with an address. An address may be assigned to only one computer at any given time but multiple addresses could potentially point to the same server. Without going into the semantics of addresses, two common ones that will be encountered are:
  • 192.168.X.X - These are local, intranet addresses. Each computer on a home network would have an address in this range.
  • 127.0.0.1 - This is a special address called the loop-back address. It is used by a server in order to allow it to contact itself. Hence, loop back.
An IP (Internet Protocol) address is a 32 bit number that is usually displayed as a series of numbers separated by dots, as seen above.

There are a lot of complexities to how addressing works, including internal (intranet) to external network addresses and how they interact.

Ports

To serve files or data, a program has to be setup to listen for new connections. However, if a computer may only have a single address then how does a server know how to associate any given connection with a program?

Ports allow connections routed to routed to the correct program. Ports for a few common web services include:
  • 20 and 21 - File Server, FTP (File Transfer Protocol)
  • 25 - Mail Server, SMTP (Simple Mail Transfer Protocol)
  • 80 - Web Server, HTTP (Hyper-Text Transfer Protocol)
  • 443 - Web Server, HTTPS (Hyper-Text Transfer Protocol over SSL/TLS)
A list of reserved ports can be found on Wikipedia.

Only one program may listen to any one port but for web servers, at least, there may exist a way to get around this limitation.

Also of note, is that evne though many clients/services don't explicitly require you to add a port it is always there. If no port is supplied, web browsers will always use port 80. When testing web applications from a local system, it is common to use port 8080 or 8880 like so: www.example.com:8080.

Domains

A domain is usually a human readable string of characters that act as an alias to an IP address. Any single domain may be associated with a single IP address. Domains on the Internet must be registered with an authority called a Domain Registrar. You supply the registrar with the address to your host and they will associate it with the name of your domain.

A top level domain (TLD) are those such as: com, gov, net and dev, to name just a few. There are many others. As long as they are used on your local machine and a client, like a web browser, never accesses a name server to resolve it (as it would for domains on the internet) then there are no real restrictions.

Unfortunately, some web browsers, Chrome and Firefox for example, can complicate matters and it is generally recommended to steer away from common TLDs.

Multiple domains may point to the same address.

Sub Domains

Any other domains are sub-domains. Without going into detail, when someone refers to a subdomain they usually refer to the prefix domain, namely www or mobile. Moving from right to left, the least significant domain is on the right.

  • www.example.com - root, com (top level domain), example, www (least significant domain)

Virtual Hosts

A web server can host many different websites. If a web server is listening on port 80 but multiple different domains point to the same address, how would the web server know which site to route the request to?

Virtual hosts associate domain names to locations on the server. Even though both www.example.com and mobile.example.com might resolve to 192.168.0.1:80 the web server is given the value of the domain. The web server takes that domain and, provided an entry for it exists in its configuration, routes communication for each domain to the correct web directory.

Hosts File

When a client like a web server is supplied a domain name, normally it contacts a service called DNS (Domain Name Service) to resolve the address the name is associated with. When doing local development, this isn't desirable as an internal web server is usually not publicly accessible.

To overcome this limitation, most operating systems have a file that is used to define local domain names. If a domain exists in this list, then the associated IP address is used without ever connecting to a DNS server to resolve the name.

In this capacity, you could even do things like route www.google.com back at your own web server.

On most GNU/Linux systems you'll find the hosts file under the /etc directory. You would add an entry like:

www.example.com 127.0.0.1

When putting that address into your browser, the system will first check for a matching entry in /etc/hosts and, if it find one, provide the given address. In this case, the loop back address which sends the connection to the corresponding port on your system. Remember, a port is always required even though they often don't need to be supplied explicitly.

Bringing It All Together

Armed with this knowledge, a developer can setup a testing environment that mimics a production server.

If developing web sites and are in need of setting up a full stack setup for each one (by that I mean, a web server, database server and any other accoutrements) then a process similar to this could be followed:

  1. Install/Copy any files from either an existing production or a framework.
  2. Add a VHost entry for your web server.
  3. Add a Domain entry to your hosts file.
  4. Create or Import a database.
  5. Work like crazy.
With such a setup, a developer can connect to a local web server as if it were a remote server.


Of course, there are a lot more things that can be done to provide even better environments. Here are a few closing thoughts to whet your appetite.

Docker

You can use Docker to mimic a production server almost exactly or, in some cases, run docker on both production and in development to ensure 100% compatibility. This can be very useful for addressing issues with software packages and libraries being incompatible versions.

Reverse Proxy

Some environments, like PHP for instance, may require using a technique called a reverse proxy. Go development can also benefit one in instances where you need both a Go application and a web server like nginx or Apache running, too. This involves having one server accept an incoming request then forwarding the request to the same address but on a different port.

Sunday, 23 April 2017

Been a While

It's been a long while since my last post. That's not to say I have been idle. In fact, it's quite the opposite.

After working 10 years first as a sales representative and another 10 years as a credit professional I decided it was time to finally pursue my true passion: programming. I have been programming since I was in my early teens and it has always been a passion of mine.

At one point I thought maybe I'd like to write games but I soon became disillusioned by the video game industry. I became confused as what I wanted to do and one thing leading to another, my career path headed in another direction.

I was quickly approaching a time in my life where either I commit to my path or pursue my passions. It was time to make a choice. However, being married with kids, making a life change comes with a tremendous amount of pressure and fear. If I fail or make the wrong choice, it's not just me who suffers the consequences.

The opportunity that presented itself also required us moving roughly 300km away. We had been wanting to move back to the Okanagan Valley for quite some time and now we could do it. My children are still young enough that lasting friendships haven't formed yet but still it would be hard on them.

So, I accepted a position at Acro Media where we primarily build custom Drupal e-commerce websites. I, specifically, work on supporting existing sites and maintaining our in-house infrastructure. Luckily, this means I don't exclusively program in PHP, Javascript, HTML and CSS. I also get to work in C++ and Go.

I wasn't sure how I would feel about working in PHP for much of my day. I am not a fan of the language for many reasons but I've been pleasantly surprised to discover that they joy of programming has thus-far eclipsed working in a language I dislike. I think this is mainly due to the fact that my day is quite varied. My job is about solving problems. It's not so much what language I'm using.

So where does that leave the blog?

Well, I hope that it means that'll be writing more again. Compilers are still a passion and I'm working on another project. It's rather ambitious so we'll see what happens there.

On that note, I think I'll call it a day. Until next time!

Sunday, 21 February 2016

The Accumulator Register

At all curious about the accumulator register? I was. Here's a little summation of what I discovered.

Overview

In modern, general purpose CPUs, the accumulator register has lost some of its meaning and much of its use has been relegated to convention rather than necessity.

I got to wondering why the ax/eax/rax register was called the accumulator and why it was used so much since it seemed that any of the general purpose registers would do. Does it just come down to convention or is there more to the story?

All the MIPS documentation I’ve read to date indicates the only register considered the accumulator is a hidden, double-word register accessed via the MFHI and MFLO instructions. This speical register is used when doing multiplication and division.

In developing my own virtual machine, and delving into electrical engineering, I got to be more intimately familiar with this happy little register.

A Little History

If my sources are correct, the first commercially available CPU from Intel was the 4004. It had a single register that was implictly used in the vast majority of its instructions. You can see for yourself on the data sheet (PDF). Can you guess which one?

The 4004 was designed in such a way that the output of the ALU fed directly into this single register to accumulate data. Operations that expect two operands always implicitly use the accumulator as the first operand.

The ADD instruction, for example, takes a single register as an operand. It performs addition on the value stored in the accumulator plus the given register and then stores the result back into the accumulator.

To my knowledge, the 4004 did not pioneer the use of Accumulator but it serves as a nice example.

Example

The design seems simple but consider how calculators or adding machines work. How about an example?

Assume the calculator has just been turned on or a clear instruction has been set so that the accumulator has a value of zero.

The user types in a value then hits the addition key. The accumulator and the value on screen get added together. There’s no operands just a simple ADD instruction. ACC += DISPLAY.

Arithmetic instructions would not need explicit operands at this level of design. It makes the accumulator a very important facet in the design of these machines.

In modern implementations, even if a CPU can use any general purpose register to execute a particular instruction, it may be optimized to perform certain operations in fewer clock cycles when using the accumulator. You’ll need to read documentation to find out.

Legacy

The result is that the importance of this register is historic and thereby remains important to standard convention. We probably no longer need to do it this way but it does have a logical, and historic, significance.

Happy programming!

Saturday, 14 November 2015

Compiler 3 Part 8 - Finale

This will end up being the longest, and final, post in the series. So let's tuck in!

Testing and Everything Else

The scanner and most of the infrastructure stayed exactly as it was.

The parser and AST did see some changes. The parser, in particular, saw several bug fixes and simplification. Some of the error checking that had been done during parsing has been passed on to the semantic analyzer. Similarly, anything related to type checking could be removed.

Also, the AST-related scoping code could be simplified and confined to the parser since it is discarded during semantic analysis.

Detecting “true” and “false” was added for boolean the type.

With those changes out of the way, I can say that if there is one area that Calc has truly lacked, it was in testing.

A compiler project should, in my opinion at least, be concerned with correctness. Correctness and consistency. The IR was one step in shoring up Calc in this area and testing is the other. Calc has always incorporated at least a little testing but it has never been as comprehensive as it should be.

Go provides great testing facilities right out of the box. The cover tool has also proven helpful in sussing out corner cases that need testing.

I have, I think, made large strides have been made in making Calc a much more stable compiler.

That isn’t to say that it’s perfect. I’m sure within 5 minutes of playing with it someone will discover some bugs. Hopefully, those than find them will take the time to submit an issue so I can create tests and fix them.

This is the value of having an open source project.

The Future...

Calc has continued to evolve along with my own skills and I hope that I’ve helped someone, in some way, to further their own growth. Whether it be teaching someone how to, or how NOT to code, the end result remains the same. Someone, somewhere, has benefitted from my efforts.

Now that I’ve had a chance to regroup from the herculean effort of writing and implementing the Calc2 spec and teaching series, I feel like I can continue to produce more content.

I do hope to continue talking about compiler construction. In the future, I hope that by using a smaller format, like this one, that I can produce more posts more quickly. I can certainly say that this third series has been much more enjoyable to write!

I have also been working on another side project to create a virtual machine with a complete software stack. It has an assembler, linker and virtual machine running it’s own bytecode. If there is interest, I’ll write some blog posts about it. You can find the project here.

I have concluded both previous series talking about future plans and I shall do no less today. In series two I had a list of things I wanted to implement and change. Let’s go through the list:

  • growable stack - well, this is null and void. Calc no longer manages it’s own stack.
  • simple garbage collector - I no longer plan on implementing a garbage collector myself. See below.
  • loops - not implemented
  • generated code optimizations - done! 
  • library/packages - not implemented
  • assembly - null and void. I don’t plan on generating assembly.
  • objects/structures - not implemented
Only one of the seven features I wanted to implement was actually completed. Of the remaining six features, three of them no longer make sense. The groundwork for other three has been laid down. In particular, loops and structures only await language design since implementation should be almost trivial.

Incorporating the remaining three features, I now present you with an updated list:

  • #line directives for source-debug mapping - I think this is a very reasonable and easily obtainable goal. It will make debugging Calc with the GNU debugger easier.
  • logical && and || operators - are easily added with minimal work
  • importing libraries - thanks to the x/debug/elf package, importing libraries into code may not be unreasonable to achieve in the near future. The scope of adding imports is likely a series in of itself.
  • garbage collection - Calc 2.x does not need garbage collection. It does not have pointers and it does no memory allocation. However, when it does, I will probably use the Beohm garbage collector rather than attempting to spin my own. Additional info can be found on Wikipedia. This feature will be far in the future so don't expect it any time soon.
  • structs, loops and arrays - with the new code generator, implementing structs and arrays on the back end ought to be trivial. Work on this has actually already begun and you can view some of the changes here. Sorry, the specification is not published publicly right now.
  • stack tracing - there isn’t much nothing holding me back from implementing stack traces on crashes now that the new code generator is in place. Time and effort are all that remains.
And that, as they say, is that! Thank you for taking the time to read through this series! At some point I’d like to put everything into a PDF but that would be a large task since large parts of both series would need to be re-written entirely. In fact, I might even want to start again from scratch.


Until next time!