Tuesday 23 December 2014

Compiler 2 Part 13: The Front End

This is it! You've made it!

The final post in this series merely covers the user-facing code and installation of the compiler and isn't really part of the compiling process.


I chose to use a Makefile for installing and building calcc and it’s runtime. The install target will not only install calcc but ensure the runtime is built before doing so.

It’s a nice improvement.

I considered the GNU autotools but it felt like bringing a howitzer to a knife fight. Just a little bit of overkill for marginal gain and a much slower system overall.

There is a test target too for generating a simple test program for the runtime.

It was a pain making it portable to Windows.


Calcc has had a few changes too.

First, I added a function called findRuntime to make an attempt to locate the C runtime. I wanted to ensure that the runtime would not be installed on the developers system so I needed a method of locating it. I had two reasons for this: a) this is an instructional compiler and once someone is finished with it I would like for them to be able to delete the source tree and the runtime with it; and, b) the runtime is statically linked into the final binary so there’s no point to having it hanging around in la-la land.

This function does not guarantee the runtime is there when it comes time to link it. All it does is do a sanity check to see if it can find it prior to trying to link it.

I added a cheeky flag to output “assembly”. Really, it outputs pseudo-assembly for the instructions in the C runtime. It’s nice to have when you want to see the C output to verify correctness.

The balance of the changes relate to multiple files and qualifying the path. For one, we want the absolute path rather than the relative one. Also, we have to do some work when compiling a directory because the path is part of the binary name.

It also strips the file extension, if need be, so that the filename itself can be used for the intermediate and final files. A file called foo.calc will generate the files foo.c, foo.o and a binary named foo. On Windows, the binary will be called foo.exe.

Learning as We Go

In writing the compiler and these articles I came across a few minor issues with the Calc 2 spec. These were unexpected and only reared their head when actually implementing them.

Making the return type of a function mandatory was one example. Since the Calc 2 spec doesn't have anything that doesn't require a side effect it made no sense to be able to call a function in an imperative form. That is, a function which executes the expressions in the body but does not return any result. The result of such a function would be what’s called a no-op.

A no-op is an operation with no effect. Look at this example:

(decl incr (n int) (= n (+ n 1)))
(decl main int (
(call incr(2)))

What does this program do? The function ‘incr’ does not return a value. Essentially, it does nothing. Yes, the value of ‘n’ is incremented within ‘incr’ but nothing is actually produced as output. No value is returned.

So why did I start with an optional return value in the first place? Well, because in the future, perhaps even Calc 3, I hope to include external libraries. I would like for there to be a print function so that an imperative function could produce output without a side-effect, without a return value.

Still keeping with functions, I had to revert my original idea of allowing nested function declarations. I may enable this again in future versions but it was problematic. For one, you could assign a declaration to a variable but there are no pointers in the language. So, what would that mean? What would it do?

I could have allowed declarations just within other declarations but it likely would have meant a bunch more code to make work and I wasn't sure it wouldn't feel clunky. So, if I couldn't do it right, I better not do it at all.

Those are a couple examples of how I had to tweak the spec slightly as I wrote. Unintended consequences of my decisions.

Again, I was not completely re-writing the spec. I was clarifying and tweaking it slightly. Any larger of an adjustment may have meant massive changes to the code which would be unacceptable. You need to stick to your plan!

The Future

So what do I have in store for Calc 3 and beyond?

I can’t say for certain just yet but I have a few ideas. Here’s my wishlist:

  • growable stack (high)
  • simple garbage collector (unknown; dependant on other features)
  • loops (moderate-high)
  • generated code optimizations (low-moderate)
  • library/packages(moderate)
  • assembly (moderate)
  • allocations and simple garbage collector (moderate-high)
  • objects/structures (high)

Some things, like packages, I’d very much like to do but would require a lot of engineering to get right. I think if I added it, I probably couldn't do much else.

Optimizing seems unlikely. I want to teach how language features work rather than building an efficient binary.

Closing Statement

This series was far too big and were I to do it all over again I would change a great many things.

First off, I would implement fewer features in one go. When drafting the outline of the series I did not anticipate just how big it would get. Some of the new features were very large topics and I’m not sure I did them all justice. I think I could have easily split this series into two to make it more manageable.

Multiple source file support was a bad idea. Not that adding the feature was bad in its own right but that it makes no sense with the rest of the lessons and adds unnecessary bloat to the series.

Next, I would write the C Runtime a bit differently to make it easier for the C compiler to optimize the assembly instructions. While my intentions of using pseudo-assembly were noble I don’t think they were the correct answer to the problem.

I am really not happy with the type-checking implementation. It’s sloppy and buggy.

I don't know if I would create another LISP-like language. I would consider retaining prefix notation for basic arithmetic operations but I would implement the rest of the language in a more C-like way. It’s just what makes me feel comfortable.

I am not entirely sure whether I will continue with another series, or not. I would like to, I think, provided it was much, much smaller. I think I would do minor, incremental steps from now on. With my current workload and the effort required to put this together I am, quite frankly, exhausted!

Thanks for reading and I hope you enjoyed it!

Compiler 2 Part 12: Compiler

Now, to look at code generation.

The vast majority of this file is completely new. Including newlines, this file has ballooned from 87 lines to over 490 lines!

So, where to start?

Compiler Object

May as well talk about the man behind the mask! Rather than a token.File it now has a token.FileSet.

The most important changes are the offset and curScope fields. Since we now need to declare variables in our code we also need to know what offset they are from the base pointer of the active frame. The offset field is reset each time a new function declaration is made but is retained across scopes within that declaration so that successive variables are each given a unique offset.

The curScope field holds a pointer to the current, active scope.

The Interface

CompileFile and CompileDir are fairly straight forward. Each creates a new FileSet, parses the file or files needed and tries to compile them to a C file.


The description summarizes this function nicely.

Rounding Up!

The roundUp16 function has to do with alignment. When requesting memory from the stack, we want to try and keep it aligned. However, rather than go on a long discourse on alignment I'm going to save that discussion for another time.

For this function to have any meaningful use the stack should be aligned first but there’s no need in our current implementation. Our stack has 32 bit address spaces and we only deal with 32 bit variables. Therefore, the stack is always in alignment. This function will be more useful at a later date.

If you’re keen on learning about alignment prior to the next series, a couple searches will serve you well.

Offsets a Plenty

nextOffset is another utility function that is hard to understand without more context. Remember in the previous post we were dealing with ebp+0 and ebp+4? This function helps us get those numbers we’re adding by determining the next available offset.

Each variable within a stack frame has it’s own offset. This allows variables within different scopes by the same name to have their own values. This function returns the next offset to use


openScope and closeScope do almost exactly what their parser equivalents do.


Rather than return a value to optimize away the complexity of compiling, compNode has been adjusted and expanded to deal with all the new object types.


The first thing to do is to lookup the variable we’re assigning a value to and verify it actually exists.

If the object has no type, then the type is being inferred from the value being assigned. If the object has a type, we must match it with the variable being assigned to make sure everything is type-safe.

The balance of the code moves the value into the object’s offset.


A binary expression, if you recall, is an expression with two operands. In fact, our language can have more than two operands, depending on the operation, but each individual operation only takes two.

The first thing is check to see if we can optimize the code. At the cost of potentially making a full pass through all the operands in the expression, we check if we can optimize the code into a single operation. More on that later.

If you recall from part 11, the first operand is always assigned to the ‘eax’ register. We call compNode to do that.

The next section is a bit trickier. Each variable needs to be assigned to the ‘edx’ register and then apply the operation to both eax and edx and store the result in eax. To complicate matters, depending on the type of expression, we may need to temporarily store the value stored in ‘eax’ via a call to ‘push’, compile another subexpression, then recall the value via ‘pop’ and store the result back in eax.


When calling a function, we need to store any arguments being passed to the function on the stack prior to calling the function. The first argument corresponds the the first parameter, and so on.

When entering into a function call, the base pointer will be 4 bytes greater than the current value of the stack pointer because that’s where the return address (offset in our case) will be stored. So the offset starts at 4 bytes.

Again, we check to make sure that the function exists, that we not attempting to call the main function again, that what we’re calling is indeed a function and that the number of arguments matches the number of parameters.

With me so far?

Once done, we make sure the types of the arguments match the types of the parameters. If all is well, we write the arguments into the offset of the variable and increase the offset by another four bytes.

Finally, the function is called.

Notice anything odd? The function name has an underscore prepended to it. That’s because C functions, most specifically the main function, are not expected to have an underscore before their name. The C compiler will be looking for a function called ‘main’ but our language also requires a function called ‘main’. To make sure the C compiler can differentiate between the two we use the underscore.


The first thing I did was open up the scope of the function declaration and make it current.

Next, assign offsets to each of the parameters of the function, if any.

The compiler then outputs a line of C code representing the function signature. All functions have the same signature, which will look like: void _funcname(void)

Our C functions take no arguments and return no values. That’s all handled by our code.

It then counts the number of variables, parameters included, in order to allocate space for them on the stack. It then multiples that value by four (bytes).

If there were any variables or function parameters declared we then use roundUp16 to find the nearest multiple of 16. Using the enter instruction to allocate the space we compile the function body. We close off by calling the leave instruction.
If the function had no variables whatsoever we can just compile the body of the function.

Last, we make an attempt to make sure that the return value of the function matches the value returned by the function.

If all is well, the scope is returned to the previous scope.

compFile and compPackage

Both of these do the same thing. They set the current scope to be the top-level scope and start the wheels in motion.


Compiling an identifier is pretty simple. First, we look up the identifier. If the identifier doesn’t have an object associated with it we panic. Otherwise, print out the variable using the supplied format and object’s offset.


First things, first. Is the value of the condition an integer? While there is only one type of variable in our language we know that will eventually have more. Also, we want to ensure that the expression used as a condition at least has a side effect, though that should have been caught during parsing.
No matter what the node is we compile it and test to see if the result is true.

We can now open up the private scope of the if statement and process both the then clause and the optional else clause.

Close the scope when finished.


Compiling an integer is pretty straight forward. We already know that we have an integer of some kind. Try to convert the literal string into a number, report an error if one occurs, and assign it to the supplied register.


What this guy does is just provide function prototypes for all our functions. I used a handy trick in Go to defer compiling the actual implementations.


This function is unique to the compilation process and must only ever be called once.

As with many of our other functions, we check a bunch of conditions to ensure everything is sane.

The balance is simply outputting C code. I’ll step you through it.

We include stdio.h so we have access to the printf function. Strictly speaking, we don’t NEED to use printf but but its handy to have our program output something since it doesn’t have any built-in print functions.

We include runtime.h to access our runtime functions.

Next we have the standard C main function, initialize the stack, call our program’s main function and finally clean up the stack. Of course, C expects us to return a value and since our program has no fail conditions we always return 0 for success.


I think it’s pretty clear what it does. It multiplies the attendant expression by -1 to make it negative.


Lookup the object for the variable to get it’s offset. If the assignment has a value then evaluate the assignment expression.


Yet another pretty straight-forward function. We simply cycle through all the elements in a function implementation to count up how many parameters and local variables were declared.


I had to do a bit of thinking to get this one right. Originally, I wasn’t going to include any optimization in Calc 2 but I decided that since I had it in Calc 1 it would be only fair to have it in Calc 2.
Unlike Calc 1, we now have to deal with things that are not numbers we can readily deal with. The only way we can optimize is if we have a binary expression that consists of only numbers. This includes all sub expressions.

The compiler is less efficient as a result because it may visit the same sub-expression multiple times but at the advantage that our compiled code will be a bit faster.

So this function returns the result of the calculation and a boolean indicating whether the value is optimized or not. The value is ignore if false is returned.


Our final function! This method gets the type of each node passed to it. If the type is unknown, invalid or doesn’t match then an error is generated.


typeOf  is separated out of comp.go. It consists of a collection of little utility functions to get the type of an object or expression. It is also bugged. It does not do proper escape analysis and determine the value of sub-expressions. I have deliberately not fixed this issue as it is just too much to deal with in this series.

The only function I feel is worth talking about is also the shortest.

validType verifies that any type is equal to “int”. That’s because it’s the only type in our language. As more types are added, more type names will need to be checked. Extra work will be involved when, or if, we allow custom type names within the language.


That's it for now! Stay tuned for the final installation of this mega series!

Sunday 21 December 2014

Compiler 2 Part 11: Using the Runtime

The compiler, like the parser, has seen some major changes. The largest change comes by way of the use of the C runtime and the pseudo-assembly calls now emitted.

To follow the code in the compiler I fear we have to delve back into assembly, the assembly-like runtime instructions, and theory.

Basic Arithmetic

To understand how the code from the compiler works I need to cover a little more assembly.

Consider adding the numbers two and three. On a piece of paper, you can just write “2+3=5” and be done with it. You might be tempted, therefore, to transpose that into C or whatever intermediary language you’re using.

It’s not always that easy. For one, you might not get the entire equation at once. For another, how do you handle variables? Or variables which point to another expression?

In assembly, and the C runtime, we’ll use the registers to hold our values prior to doing the actual arithmetic.

setl(2, eax);
setl(3, edx);
addl(edx, eax);

The result of the addition is stored in the eax register so that subsequent operands can easily be added. To extend the previous example, consider 2 + 3 + 4 + 5:

setl(2, eax); /* eax is 2 */
setl(3, edx);
addl(edx, eax); /* eax is now 5 */
setl(4, edx);
addl(edx, eax); /* eax is now 9 */
setl(5, edx);
addl(edx, eax); /* eax is now 14 */

Notice a pattern? After the first operand is assigned to the eax register, all subsequent operands are only ever assigned to the edx register since the eax register always holds the result of the last instruction. We can use this pattern to our advantage when compiling our code.

Nesting Arithmetic Operations

What about nested expressions like this: 1 + (5 - 2)?

If we use the above pattern, we have a problem because each nested expression starts the process anew. Since the first operand is always assigned to the eax register, the value of 1 stored in it would be overwritten by the value 5.

setl(1, eax);
/* now the inner expression, (5 - 2) */
setl(5, eax); /* uh oh */
setl(2, edx);
subl(edx, eax); /* eax is now 3 */
/* now what? The 1 is lost! */
addl(edx, eax); /* result is 5, which is clearly wrong */

Remember, we’re not writing this code by hand. If we were, we could be clever in our use of the registers.

Maybe we could ensure we run the nested expression first? It’s still not that simple. What if you have two nested expressions at the same level. Since we’re not writing this by hand we need an algorithm or some other technique to solve our problem.

Enter the push and pop instructions. We can temporarily store intermediate values on the stack.

setl(1, eax);
pushl(eax); /* store the value in eax on the stack */
setl(5, eax);
setl(2, edx);
subl(edx, eax); /* eax is now 3 */
movl(eax, edx); /* move the value of 3 stored in eax to edx */
popl(eax); /* store the last value pushed (1) in eax */
addl(edx, eax); /* eax is now 4 */


Note that the value of 1 was originally stored in the eax register before it was pushed. In order to preserve the order of the values (1 was the first operand in the addition expression) we have to first move the value of eax to edx and then pop the stored value back into eax after we’re done handling the expression of higher precedence. This is to preserve the order of the operands.

This is pretty hard, I know.

Another thing I’d like to bring to your attention is more of a curiosity than anything. If you recall in Part 9 I said that the stack was more elastic than a standard stack data structure. Push and pop both move the stack pointer, increasing and decreasing the size of the active stack frame. The frame size is only frozen during a further function calls or the frame exits.

While you should probably always pop any information you push, strictly speaking it doesn't actually matter in the end. The location of the stack pointer doesn't matter when the function goes to exit because the leave instruction automatically moves it to the location of the base pointer. Anything you pushed, but haven’t popped, is lost and becomes unreachable.

Writing By Hand and Learning by Example

I spent a lot of time examining the assembly output from GCC and writing out assembly by hand for the expressions I’d created.

I have pages and pages of notes where I worked out the order of instructions to make sure I got the correct end result. I computed everything, step by step, by hand. I wanted to ensure its correctness before ever laying down a piece of code.

With GCC you can use the -S option to output assembly. Ensure that most optimizations are turned off using the -O0 option. You may have to use variables to view how equations are calculated because even with optimizations turned off, simple arithmetic is still optimized out.

Stack Frame

When entering a function it may have parameters and local variables that are declared. Let’s look at a simple function to add two numbers:

(decl add (a b int) int (+ a b))

Let’s assume that both the stack pointer (ESP register) and the base pointer (EBP register) both point at the bottom of the stack. Call it index 0 (zero).

The declared function ‘add’ takes two integer arguments and has no local variables. That means we need at least 8 bytes of stack space. To reserve that space, we need to call the enter instruction.


Eight bytes are now reserved for the add function.

If you recall from Part 10, when enter is called the current location of the base pointer is stored on the stack and its position is moved up by four bytes. The ebp register is now at index 4 and the esp register is still at index 0. The stack pointer is then moved to the base pointer and moved a further 8 bytes up the stack, making its resting place index 12.

Hopefully, an illustration will help:

Step 1: original positions of stack and base pointers
0 <- base pointer, stack pointer

Step 2: store address (index) of base pointer
0 <- base pointer, stack pointer

Step 3: move the base pointer up 4 bytes
0 <- stack pointer
4 <- base pointer

Step 4: move stack pointer to base pointer
4 <- base pointer, stack pointer

Step 5: increase stack pointer by 8 bytes, as requested via call to enter()
0 <- original base pointer value; the return value, is stored here
4 <- base pointer
12 <- stack pointer

We have now established the active frame. It consists of the addresses between the base pointer and the stack pointer, bytes four through eleven (a total of 8 bytes).

The data for the parameters ‘a’ and ‘b’ have already been stored at the bottom of the frame by some code you haven’t seen, yet. Just assume that it’s true. Since integers are four bytes we can use this knowledge to retrieve these values.

By moving the base pointer by a relative amount we can retrieve these stored values. The first four bytes are located in the bytes between ebp+0 and ebp+3. That is the variable ‘a’. The second four bytes are located between ebp+4 and ebp+7. That’s variable ‘b’.

4 <- base pointer, bottom of the active frame & location of ‘a’ variable (ebp+0)
8 <- location of ‘b’ variable (ebp+4)
12 <- stack pointer, top of the active frame

Knowing that, we can use the knowledge we have for doing arithmetic to compute the answer!

movl(ebp+0, eax);
movl(ebp+4, edx);
addl(edx, eax);

Finally, we call the leave instruction to return the base and stack pointers back to their previous locations. In our contrived example, they’d both be returned to index zero, the bottom of the stack.

Note that we don’t have to use ebp+0. We could just use the ebp pointer. The +0 is an artefact of the code used to generate the C output, as you will see later. I used the same code here as the compiler would emit for continuity.

The whole function will look like this:

movl(ebp+0, eax);
movl(ebp+4, edx);
addl(edx, eax);

That’s all there is to it!

No Return Value?

Actually, there is. The eax register holds the return value of the function. Since that was set when we did the addition there’s no more work to be done!

Storing Parameters

Keeping the above in mind we can now look at storing parameters when calling functions. The function call itself is nothing special, it looks just like a standard C function call. That said, storing the parameters is pretty interesting.

Before calling a function, we need to store any parameters we need to pass to it. Well, we know where those parameters will be stored: at the bottom of the stack frame. However, at this point, that frame hasn't been created yet; so what gives?

Based on the knowledge we already have we actually know where to put them. Remember that the stack pointer is already at the top of the current stack frame? Well, that’s our ticket to fame!

When the enter instruction is given, we know that four bytes are used to store the current location of the base pointer. So, by utilizing the stack pointer and adding four to it, we’ll be at the location of where the base pointer will be in the forthcoming frame for the function call!

(add 2 3)

setl(2, esp+4); /* ebp+0 in the next frame */
setl(3, esp+8); /* ebp+4 in the next frame */
/* call the add function */

Clear as mud? How about an illustration to compare with the ones given above.

0 <- stack pointer (esp)
4 <- location of ‘a’ parameter (esp+4/ebp+0)
8 <- location of ‘b’ parameter (esp+8/ebp+4)

Bare in mind that it is vitally important that when you are storing parameters that you do nothing else prior to calling the function!


There are four, standard, general purpose registers that all x86 architectures have as discussed in the section on assembly.

We only really need two of those registers for the vast majority of our code. The question you might be wonder is, why only eax and edx and not the others?

I'm not aware of any specific reason other than a historical one. I’d be happy to be proved wrong! It may be that the eax register has been optimized in some way to make accumulating data faster or easier. Of that knowledge I am, unfortunately, woefully ignorant.

Regardless of that, it just makes sense to follow suit to the intended purpose of the registers. If anything, it makes looking at the generated code much easier for anyone else familiar with assembly. They'll expect those registers to be used.

I can tell you that in assembly, eax always stores the return value of the function. In C, main must always return a value. This value is passed to the OS when the program exits and that value must be stored in the eax register.

So other than consistency and familiarity I have no other good reasons for you. There are no rules that I know of that specify you MUST use those registers. What I will caution you on is that you would be breaking an established norm which may or may not matter.

On to Compiling!

You’ll finally get to see this code in action in the next section.

If you've gotten this far the next, and final, stage shouldn't be too hard to follow.

Compiler 2 Part 10: C Runtime

The C runtime is a small utility library that has been written to help facilitate compiling. It mimics the underlying assembly calls while still giving us some of the advantages of C.

Surprisingly few instructions are needed to achieve our goal.

For those of you who may be strong C programmers you might think this runtime a little nuts. If I were compiling strictly into C I most certainly would not do it this way. The intent of this library is to mimic assembly calls using pseudo-assembly rather than output proper C code.

Important Note: If you’re not strong at C, try not to worry about the implementation of the runtime. Focus more on the instructions themselves as they relate to assembly.

Registers and Stack

There are numerous ways to implement the stack and registers. I, personally, found it easiest to use the char data type.

The char type in C is one byte. Unlike the void type, which is also 1 byte, the char type can be de-referenced. This makes it ideal for our task.

We also want to be as efficient as possible. First, assume the largest type available to you is a 64 bit variable. Knowing that, you create a 1024 element array of an 8 byte data type. Maybe it looks something like this:

int64_t stack[1024];

Second, let’s assume you want to store a bunch of 32 bit values. Let’s say...512 of them:

for (int i = 0; i < 512; i++) { stack[i] = (int64_t) i; }

The stack size is 1024 * 8 bytes = 8192 bytes. You’ve just stored 512, 4 byte values into the stack. 512 * 4 = 2048 bytes. However, you had to access 512 elements of an 8 byte array to store them which means you’ve consumed 4096 bytes of memory to store 2048 bytes of data!

There’s no question that that is horribly inefficient. Thankfully, there’s a better way.

Enter the char type. The char type is one byte which means we can put any size of data into it provided we’re clever enough. You use 4 bytes to store a 32 bit value right? Well, it turns out we can copy an integer into four char bytes because they take up the same amount of space!

Our code then becomes:

char stack[8192];

for (int i = 0; i < 512; i++) { memmove(&stack[i*4], &i, sizeof (int32_t)); }

Now we've taken up exactly the amount of space we need to store our data without any gaps or waste.


The rest of the runtime is concerned with instructions.

Since memory addressing and copying is a bit different than assembly I’ve split the assembly mov instruction into two: mov and set.
  • mov copies the data at one address and copies it into another address.
  • set copies a value into an address.
The add, div, mul, rem and sub instructions should all be straight fairly forward as are the comparison functions

Enter and Leave

These two instructions are a bit difficult to grasp. It took me a long time to wrap my head around them. First, you need a little background.

If you recall, I said that it was easier to not use C-style function prototypes. This is because your language may not determine the type or the number of parameters until runtime. To accommodate that, we’ll use the stack to transfer data between functions and to keep track of local variables.

When you enter a function, you need to reserve space for the variables used within the function. The ast.Scope from the ast.DeclExpr object has everything we need to determine how much space we need. If there are four local variables of 32-bit integers then we need 4*4 = 16 bytes, of space.

This space is gained via a call to enter and released by a call to leave.

Enter starts by determining the offset of the current base pointer and the start of the stack. While it would be handy to simply just store it’s address (as you would in assembly) taking it’s offset is actually preferable, as you’ll see in a second. We copy that value into the space starting at the current location of the stack pointer.

Eight kilobytes of stack space is quite small so it stands to reason that we’d want to grow the stack once we run out of space. If we have to allocate a new stack, copy the data from the old to the new, then any stored memory addresses would no longer be valid. However, the offsets will remain unchanged so we can copy the memory and still retain our offsets with relative safety.

Since the stack pointer always needs to be at the top of the current frame, we increment its address by the same number of bytes as the address structure we’re using. In this case, it’s 32 bits or 4 bytes.

Now that we know the location of the old base pointer (we just stored it) we can adjust for the new frame we’re creating. Move the base pointer to point at the same location as the top of the stack (stack pointer). Then increment the stack pointer’s address by the amount of memory needed for the function. Done!

Leave works in reverse. Move the stack pointer to the base pointer since it’s the bottom of the frame. Move it down by the same number of bytes as was incremented in enter and now we can access the offset of the previous base pointer. Copy that address into the base pointer and we’re back to the previous frame!

Push and Pop

These two sister functions are pretty simple. Push first tests to see if a stack overflow will occur. If everything is okay, the value of src is copied into the stack and the stack pointer is moved up four bytes.

A stack overflow occurs when the data on the stack outgrows the available memory. This will later be addressed by attempting to grow the stack but at this point our program will simply fail.

Pop does the exact opposite. It first detects whether a stack underflow will occur and, if not, move the stack pointer down four bytes and copy the value found into dest.

A stack underflow occurs when the stack pointer would be moved below the first address in the stack. This happens if you try to pop data when there is no more data to retrieve.

Stack Memory

I think the code for the enter and leave instructions make it pretty clear why the memory in the stack is automatically allocated and freed. Once the frame pointer goes out of scope it is automatically overwritten the next time that area of the stack is needed. The old data just sits there as useless bits, not bothering anyone.

It also exemplifies the usefulness of a garbage collector. If you allocate heap memory to a variable inside a function and then that stack frame goes out of scope that memory is lost forever until the program exists. There’s no way to retrieve that address to free it unless you have a garbage collector.

C Compiler Optimizations

By virtue of how the runtime was written, there’s not much room for the C compiler to optimize the generated code. This was intentional.

What I want to make clear that if you truly intended to use C as an intermediate language you would probably do things a bit differently.

Let’s take a simple expression:

(+ 2 3)

Using the runtime, we would do something like this:

setl(2, eax);
setl(3, edx);
addl(edx, eax);

There’s not much the C compiler can do to optimize this code. We could request these functions be inlined, which does help, but that’s about the extent of things.

If I were truly wanting write this in C, and retain the use of pseudo-registers, I would instead do:

*eax = 2;
*edx = 3;
*eax += *edx;

With GCC 4.8.1 and -O2 optimization it produces (almost) one-to-one C to assembly. Were we to write ‘2 + 3’ directly, the C compiler would optimize the answer to 5. Unfortunately, the compiler can’t do that because we’re using the eax and edx pseudo-registers.

There are a couple of other changes I could make to allow the C compiler to generate even more efficient code. That’s not the point of this series, though.

Done For Now

That’s it for the runtime. I’ll explain how to actually use the runtime in the next few articles.

It’s worthy note that the runtime covers only a teeny-tiny subset of assembly. If writing code in actual assembly, there are optimizations we can make by utilizing other instructions not introduced here. Again, that’s not really the point of this series. I just want to get you on the right track!

Saturday 20 December 2014

Compiler 2 Part 9: Assembly Primer

Assembly is a huge topic and there are entire classes dedicated to the subject. It would be impossible to teach assembly in a single post but what I hope to achieve is enough of an overview to make the C code that’s going to be output by the compiler, and the C runtime, make some sense.

This article will be concerned with the x86 instruction set and architecture. There are several different syntax’s (Intel and AT&T for example) for assembly but I’ll try and keep the code as language agnostic as possible.


Everyone ought to be familiar with a byte.

A byte is made up of a number of bits which represent the smallest unit of information. The vast majority of modern desktop PCs have 8 bit bytes. A single bit has two states: 1 or 0. A single 8 bit byte has 256 different combinations of bits.

Since a bit has two states, grouping multiple bits together can determine how many different combinations can be achived. In the case of an 8 bit byte, we calculate it by raising two to the power of eight, like so: 2^8 = 256. Therefore, a single byte has the ranges: -128 to 127 (signed) or 0 to 255 (unsigned). It is represented by a ‘b’.

A word is two bytes. In C this is the short type. It has 2^16 combinations or byte*byte. It has the ranges -32,768 to 32,787 or 0 to 65,535. It is represented by a ‘w’.

A long is two words or four bytes. In C this is usually the int or long type. It has 2^32 combinations and is represented by an ‘l’ (lower case L).

A quad is two longs, four words or eight bytes. In C it is usually the long long type. It has 2^64 combinations and is represented by a ‘q’.


There are eight basic, general purpose registers and they are follows:

Example Usage
arithmetic, return value
data pointer, temporary storage of intermediate value
temp storage
stack pointer
points to the top of the current stack frame
base pointer
points to the bottom of the current stack frame
source index
source array index
destination index
destination array index

These registers can be accessed in various sizes (byte, word, long and quad). A quad sized (64 bit) register has an ‘r’ prefix, a long sized (32 bit) register has an ‘e’ prefix and both the word and byte sized registers have no prefix. The target architecture restricts which registers are available. As an example, 64 bit registers are not available on 32 bit machines.

The four registers ending in x can also have their hi and low bytes accessed for storing byte sized information. In these cases, the x is replaced by an h for the high byte and an l for the low byte.

Using the ax register as an example:

byte (low)
byte (high)

The numbers 1 through 8 represent the bytes accessed.

64 Bit General Purpose Registers

In 64 bit mode, the registers R8 through R15 are also available. The eight general purpose registers discussed above constitute the first eight registers, 0 through 7. In 64 bit mode we have access to these additional eight registers, 8 through 15.

To access the 32-bit version of these registers requires the D (double-word) suffix (r8d), a W for 16-bit (r8w), and a B for byte access (r8b).

As an example of usage of these registers, the version of GCC installed on my machine will use the rcx, rdx, r8 and r9 registers as the 1st, 2nd, 3rd and 4th arguments to functions, respectively, when compiling for 64 bit.


There are a lot of instructions but here are the ones needed to do some basic arithmetic.

mov a, b
Copies the data stored in a to b
add a, b
Add b to a and store it in b
sub a, b
Subtract b from a and store it in b
mul a, b
Multiply b by a and store it in b
div a, b
Divide b by a and store it in b

Depending on the language, the operands might be reversed (mov might copy a into b in one language but b into a in another). The way the registered are addressed in one assembly language or another might be different but rest assured if you understand one you shouldn’t have much trouble figuring out another.

An example of 2+3 in pseudo-code:

mov 3, eax (set eax to 3)
mov 2, edx (set edx to 2)
add edx, eax (add 2 to 3 and store it in eax)

Each instruction might have a specific size it operates on. So the mov function would have either b, w, l or q appended to the instruction name (movb, movw, movl and movq), as would the arithmetic operations.

Fun Fact: Another name for an instruction is opcode. Opcodes are mnemonics for an instruction code. Movl is actually the number 0x89.


There are six segment registers:

stack segment, holds the address of the stack
code segment
data segment
extra segment
extra segment (e+1 = f)
extra segment (e+2 = g)

You don’t really have to worry about these too much. The stack segment is the only one we’ll be dealing with, and indirectly at that.


It took me some time to really wrap my head around how the stack works.

The stack segment holds a pointer to the first address in the stack. It doesn’t matter where this memory came from. It works like any other memory.

The stack space is separated into segments called frames. A frame is used to hold all the data needed for a function. This includes, but isn’t limited to, local variables, the return address of the function and a pointer to the previous frame.

The sp register, the stack pointer, holds the address of the top-most location on the stack. This is the top of the current, or active, frame. The bp register, the base pointer, holds the address of the bottom of the frame. The active frame is the address space between the sp and bp registers.

Stack Frame

In a stack data structure you push (add) and pop (remove) elements to and from the stack. I would describe manipulating the stack in assembly as more fluid or elastic.

A push or pop instruction will push a single element to the stack but that element might not necessarily be the same size as another element. Depending on other instructions you may need to align the stack pointer to the nearest 16 bits.

You can also request stack memory without a push or pop instruction at all. You simply take what you need and mark off the area with the bp and sp registers.


Assembly will let you do pretty much anything. It will hand you the gun, load a bullet into the chamber and hold your hand while you shoot yourself in the foot.

When adjusting the stack or base pointers, for example, you can set them to whatever memory addresses you want. Nothing will stop you from trying. That’s not to say it won’t have disastrous effects but it’ll let you do it.

Assembly has no notion of types. Want to add a character to a memory address? Ok! Want to overwrite part of a 64 bit value with a 16 bit value? Ok! It doesn’t care. It will obediently do what you tell it to. Consider this your forewarning.

Use extra, extra caution when programming in assembly.


Why am I telling you all this?

Well, you might want to compile directly to assembly in your own project and this serves as a short primer on the subject. More importantly, as already stated, it is going to serve as a basis for the C runtime that has been provided.

As it turns out, assembly is useful for compiling languages. For one, it deals with data in a type agnostic way. As previously stated, assembly deals with values and memory addresses but it has no notion of type. It’s all just bits and bytes.

Additionally, the way registers work and how the stack is accesses is also of use. I won’t get into any details right now but passing arguments and handling variables is much easier than trying to translate them into a proper C equivalent function prototype.

Consider this: A language like Python is dynamically typed. C is statically typed. How do you translate a Python function prototype into a C prototype without knowing the value of the variables being passed at runtime a priori?

One answer is to use the void pointer type but I think a better answer to not use parameters at all.

I can’t say I’ve really done assembly much justice. There are heaps more to understand before I think you can really “get” assembly. If you’re interested in the subject, I highly recommend doing some searches.