Saturday, 14 March 2015

Objects In Memory

Until recently, I can’t say that I ever gave much thought to how objects are represented in memory. I don’t write performance critical applications where a few milliseconds of extra speed might matter. I also don’t make programs for hardware with limited resources.

In short, I didn't feel like I needed to know.

That is, until I started learning about compilers and assembly. Suddenly, how data was represented in memory mattered and I began having to ask a lot of questions.

Questions like: How does a struct or array look like in memory? How are their elements accessed? Is extra space is needed for the struct itself?

There are even more questions to ask about word boundaries, cache hits and paging but for the sake of brevity, we’ll keep it to just those posed.

Because Science!

OK, not really.

C is pretty close to the hardware and I probably understand the x86 assembly for it best so I chose to write an example in that language and peeked at the assembly output.

struct { int a, b; } s = {1, 2};

A simple sizeof call revealed that the above struct was 8 bytes. Each integer takes up 4 bytes of space so this is a reasonable result.

Next, I inspected the assembly generated by gcc. The values 1 and 2 were moved onto the stack, side by side. We would call this segment of memory contiguous.

Finally, I wrote a third test which passed the struct as an argument to a function. Again, only the two values, represented by ‘a’ and ‘b’, were copied onto the stack as arguments to the function.

Other than how they were arranged in memory, the fields of a struct are treated like any other variable.

Conclusions, Struct Test

Based on my tests, I was able to draw the following conclusions:

  • The space needed for a C struct is the sum of its fields (see note below);
  • The memory for a C struct is kept in a contiguous block of memory for easy access.
Note: Actually, in truth, data types might need to be aligned and can have extra padding added making a struct an unexpected size, depending on its layout. If you're interested in learning more, this link was recommended to me.

In the end, a struct is only meaningful to the compiler and programmer. Each field is merely a memory address and nothing else about the struct is known to the CPU.

What about methods?

Function Pointers

Actually, before we can discuss object methods, we need to talk functions. To talk about functions, we need to cover pointers.

A pointer is just like any other variable except that the value it holds is a memory address.

The size of a pointer is the size of a memory address in the target system. If the platform is 64 bits, a pointer is 8 bytes. If it’s 32 bits, the pointer is 4 bytes and so on.

void func(void) {};
void (*f)(void) = func;


The sizeof f on my system, which is 64bits, confirms that a pointer is 8 bytes.

Note: Taking the sizeof the function name, in this case func, gives the size 1 byte. To understand why, check out this response on Stack Overflow.

Methods

A method might seem a lot more complex than a function pointer but I’ll quickly demonstrate why it’s not.

In Go, you see something like this:

type myStruct struct {}
func (s myStruct) Method() {}


So why is it not so complex? Well, its because another way to express this same method call is:

func Method(s myStruct) {}

Once you recognise that a method call is merely syntactic sugar, dispatching the call becomes much simpler. In both cases, move the data in argument s to the stack or a register and call the function as normal using the first parameter as the struct data.

Arrays

Armed with knowledge we can deduce how an array works pretty easily. An array is merely a segment of contiguous memory.

int arr[5] = {0, 1, 2, 3, 4};

The assembly for this C array moves each value onto the stack one at a time and side by side. All the data for the stack is kept packed together but other than that there’s nothing special going on.

All the CPU needs to know is where the first element of the array can be found. We can access the rest by using a byte offset from the first element.

Note: If you ever wondered why an array in C is a pointer (like I did) then there’s your answer.

To access a specific index, x86 assembly uses SIB (scaled index byte) addressing: base + (scale * index).

Base is the address of the start of the array, in this case the pointer arr. The scale is the size of each element. If using a standard word size element on a 32 bit machine, it would be 4 bytes. From the address of arr, add 4 bytes multiplied by i to get the ith element.

Note: In C, arr[i] is equivalent to arr+i for this very reason. In both cases, you are accessing the ith memory address relative to arr’s address.

Note: A C array does not contain any information about the array itself, like its length. This is why C strings use a NULL character to terminate strings. It’s the only way that C functions can detect when it has reached the end of a character array. It would also another reason the C specification does not have bounds checking.

Implementing

I hope my explanations will help you implement arrays, structures (or classes), function pointers and methods in your own language. Much of these things seem very daunting but it’s not actually that hard.

The hard part is wrapping your head around it!

In the end, the generated code is pretty simple. The complexity is, generally speaking, going to be in your compiler.

If you’re interested in what some Go types look like, check out this blog post by Russ Cox here.

Sunday, 8 March 2015

Concurrency vs Parallelism

Today, while I was reading up on instruction pipelines, I found my mind wandering slightly off on a tangent. I started to think about concurrency and parallelism.

It occurred to me that there is a really simple way to illustrate the difference if you’re struggling with the concept yourself. There is also a great talk by Rob Pike called “Parallelism is not Concurrency” that may help you, too, that goes a lot deeper into the comparison.

Some Kind of Process

Let’s use an example of a computer assembly line. We can define a series of steps that must be followed to properly build our computer. Each stage is as follows:


  1. Place empty case on conveyor and put the cover beside it..
  2. Insert and fasten the motherboard to the case.
  3. Insert the CPU, Memory and a graphics card.
  4. Insert a disc drive and a hard drive.
  5. Finally, insert the power supply unit, hook up all the cables and close the case.


Note: A common word, and metaphor, for a process like this is a “pipeline” because the product of each stage flows into the next and only travels in one direction.

Concurrency

Each step is dependant on the one that comes before but each can be worked on independently of the others. Each stage can have a separate person working on them or a single person moving between each step.

Concurrency, to my mind, is taking a process, a task that requires a specific order, that you can divide up into steps that can be worked on independently of the others.

Parallelism

Adding a second line, completely separate of the others, is an example of parallelism. It requires, at minimum, two separate workers for each line.

Concurrency in Parallel

A single worker is like a single core CPU. This worker must manage its time between each task, bouncing between each stage.

This same process can have many workers each one attached to a stage in the process. This is a concurrent solution that has been parallelized.

Summation

Concurrency is a design pattern solution and does not have direct hardware constraints. Concurrency solutions can run in serial but often can be easily run in parallel.

Monday, 19 January 2015

Fiddle and Bits and Bits and Bits

My last post didn't fully answer the question: "What are the possible values a variable can be?" I will complete the answer here.

I will use 8-bit bytes for the entirety of this article.

Least Significant Bit


The least significant bit is the right most bit. It is so named because it represents the smallest possible number in the binary system: zero or one. It is the least significant value of the eight.

Don't let that fool you. It can be quite useful. More on that at the end of the post.

Byte of Bits


Lets look at an empty, zeroed byte: 00000000 (eight zeros).

The least significant bit is the right-most bit. So let's switch that bit on: 00000001.

We now have the number one. Move the bit up one slot and the value becomes two: 00000010.

How did that happen? Maths!

Math


Each bit in the byte is 2's compliment (see my previous post).

Each bit's value is 2 raised to a power. Starting at zero, incrementing by one for each step, move right to left and raise 2 to that power. The least significant bit is 2^0. The second least significant bit is 2^1, then 2^2 and so on. Sum the bits turned on for the value the byte represents.

Check it out:

Binary
Equation
Decimal
00
(0 x 2^1) + (0 x 2^0) = 0 + 0
0
01
(0 x 2^1) + (1 x 2^0) = 0 + 1
1
10
(1 x 2^1) + (0 x 2^0) = 2 + 0
2
11
(1 x 2^1) + (1 x 2^0) = 2 + 1
3

The value of each bit, from left to right, looks like this: 128, 64, 32, 16, 8, 4, 2, 1.

If you sum all these values, you get the value: 255. That's the maximum value of an unsigned byte!

Signedness


Ok, time to makes things weird. Well, not finding out your life-partner is your cousin weird but weird enough.

On the opposite side of the byte from the least significant bit is the most significant bit.

In many languages a type may be signed or unsigned. An unsigned byte can only be positive, whole numbers. A signed byte can be positive or negative whole numbers.

Most Significant Bit


A computer doesn't have a notion of a negation sign. It can represent a negative number, sure, but all it sees are bits. Using two’s compliment, how can a single bit be a sign when it must be a number?

The most significant bit does determine whether the value is negative or not. However, the eighth bit always has a value of 128.

Confused, yet? Good!

More Maths


When a computer interprets a signed byte the most significant bit becomes a negative value not a negative sign. Rather than being the value 128, it becomes -128.

Consider this: can you have a value of negative zero? Can zero ever be negative or is zero just zero?

If zero can’t be negative and the most significant bit merely represented the sign of the value, then what does 1000000 mean? This would be -0 and that is not a number.

To make the value -1, we need to add a value to -128 to get there. If we remove the eighth bit from the byte, the remaining seven bits consist of the values: 64, 32, 16, 8, 4, 2 and 1. Sum them all together and you get the value 127. -128 + 127 = -1.

Negative one is 11111111. There's a trick with this at the bottom of this post.

Let’s do another example of a negative number. Lets shoot for -54. What number do we need to add to -128 create this number in binary?

We just need to solve this equation: x = y + z; where x = -54 (the number we want), y = -128 (most significant bit) and z is the value we need.

-54 = -128 + z
-54 + 128 = -128 + z + 128
74 = z

Set the balance of the bits to 74 and the actual value will be -54: 11001010

Signed Values


We can now fully answer the question, “What values can a given C type have?”

A single byte has 256 different values. A C type that consists of a single byte is the char type.

An unsigned char can consist of the values 0 to 255.
A signed char can consist of the values -128 to 127.

Not a Dummy


Don’t feel bad if you have to read things over a few times for it to sink in. It took me a bit to really wrap my head around this but once I did, things really fell into place.

You can cheat, too. If an 8 bit byte has 256 values then divide it by two. That value negated is the lower bound. Reduce that same value by one for the upper bound.

256 / 2 = 128
Lower bound = -128
Upper Bound = 128 - 1 = 127

Fun Fact


Binary is how numerical permissions work for the ‘chmod’ command in Linux.

Bits
Value
Permission
000
0
None - Denied!
001
1
Execute
010
2
Write
100
4
Read

Combining the bits together we get the following table.

011
3
Write (2) + Execute (1) = 3
101
5
Read (4) + Execute (1) = 5
110
6
Read (4) + Write (2) = 6
111
7
Read (4) + Write (2) + Execute (1) = 7

This is the beauty of binary and two’s compliment. You can combine any number of unique options together without overlap.

Cool Tricks


Now that you’re a pro at understanding binary, I thought I might show you a pair of nifty tricks before I leave you.

1. Want to test for whether a number is odd or even?

if (n & 1 == 0) { /* even */ } else { /* odd */ }

2. Want to test whether a signed type is negative or positive?

const char MSB_BYTE = 1 << 8;
if (n & MSB_BYTE == MSB_BYTE) { /*negative */ } else { /* positive */ }


3.  Want to set an unsigned variable to it's maximum value but don't know what it might be on a particular machine? Many languages will let you do this little trick.

unsigned char c = -1

It sets all the bits on, which is always the maximum value. Go back to the "More Maths" section to see why this works.

There are no guarantees these tricks will work everywhere so read the specification for your language and check the compiler implementation to know before using them!

If you have never done bit shifting or used logical operators than this tricks probably make no sense. Once you learn about them come back here and think about these tricks and see if you can figure out why they work.

Friday, 16 January 2015

Type Sizes and Max Values

Today, I want to take a step back and talk about something a bit more elementary.

If you've just started learning to program, understanding the value ranges of types can be hard. I hope that I can maybe impart some knowledge to you that will help you get a leg up.

In computing, we use a system called binary to count. Binary is a sequence of ones and zeros that are used to represent numbers.

We'll begin with...

Bits


I like to think of a bit like a gate in a yard; a gate that can only be in one of two states: open or closed.

An open bit is represented by the number 1. When open it allows an electrical current to flow through it. A closed bit, which does not allow electricity to flow through it is represented by the number zero.

This is the essence of binary. A grouping of numbers which have only two states.

Note: When a gate is open, it is in the “high state” (as it would be called in electrical engineering), allowing maximum electron flow through the gate. When a gate is closed, it is in the "low state".

Two’s Complement


As I just said, if something consists of a single bit, it has two states. 1 (one) or 0 (zero). So, if you add a second bit into the mix you have four possible states. Two states for the first bit and two states for the second.

All possible combinations of these two bits looks like this:

00 - bit one and bit two are both closed
01 - bit one is closed and bit two is open
10 - bit one is open and bit two is closed
11 - bit one and bit two are both open

If you add a third, you get 8 possible combinations. They are: 000, 001, 010, 100, 011, 110, 101, and 111.

Each bit you add has exponential growth for the maximum possible combinations. For each bit you have, you raise two to that power. One bit is two to the power of one. Two bits is two to the power of two. Three bits is two to the power of three and so on.

This is two’s complement.

Bytes


A byte is a specific number of bits grouped together.

Different computer architectures can have a different number of bits per byte. The Intel 4004 CPU, for example, had a 4 bit word size. The ASCII character set is based on a 7 bit byte. An 8 bit byte is most common and the remainder of this article will be based on this byte size.

We must raise two to the power of eight to find out the maximum value a single byte can represent. Two to the power of eight is two hundred and fifty-six.

Want proof? Enter the number two into any calculator. Multiply it by two seven more times. 2x2x2x2x2x2x2x2 = 256.

Looking at the progression, one step at a time: 2, 4, 8, 16, 32, 64, 128, 256.

A byte, then, can hold 256 different values but a single byte can’t actually hold the value 256. 

So, why not?

Zero is a Value Too!


Zero, is one of the 256 values that a byte can be. Consider a single bit. What are it's values? Zero or one.

When all the bits in a byte are closed it can only be one thing. Zero.

If zero is one of the 256 values a byte can be it means that the maximum value that can be held in a single byte is 255. 

Multiple Bytes


Bytes can be paired up, too, using two’s compliment. Two, four and eight are common multi-byte pairings.

Each time, we raise the power up further. A single byte is 2^8. Two bytes is 2^16. Four bytes is 2^32,. Eight bytes is 2^64.

Value Ranges of C Types


Lets look at a C type of each typical byte size.

  • char = 1 byte
  • short = 2 bytes
  • long (int) = 4 bytes
  • long long = 8 bytes

So, based on the above pairings, can you deduce what the value ranges are for each of these types?

Well, we already know how high a byte goes since it has 256 values including the zero value, so it has a range of 0 to 255.

  • A short is 2 bytes (16 bits). 2^16 is 65536 for a value range of 0 to 65535.
  • A long or int is typically 4 bytes (32 bits). 2^32 is 4,294,967,296 for a value range of 0 to 4,294,967,295.
  • A long long is typically 8 bytes (64 bits). 2^64 is 18,446,744,073,709,551,616 for a value range of 0 to 18,446,744,073,709,551,615.

All you need to know is how many bytes a type consists of and you can work out the value yourself!

Stay tuned for more...

Tuesday, 23 December 2014

Compiler 2 Part 13: The Front End

This is it! You've made it!

The final post in this series merely covers the user-facing code and installation of the compiler and isn't really part of the compiling process.

Makefile


I chose to use a Makefile for installing and building calcc and it’s runtime. The install target will not only install calcc but ensure the runtime is built before doing so.

It’s a nice improvement.

I considered the GNU autotools but it felt like bringing a howitzer to a knife fight. Just a little bit of overkill for marginal gain and a much slower system overall.

There is a test target too for generating a simple test program for the runtime.

It was a pain making it portable to Windows.

Calcc


Calcc has had a few changes too.

First, I added a function called findRuntime to make an attempt to locate the C runtime. I wanted to ensure that the runtime would not be installed on the developers system so I needed a method of locating it. I had two reasons for this: a) this is an instructional compiler and once someone is finished with it I would like for them to be able to delete the source tree and the runtime with it; and, b) the runtime is statically linked into the final binary so there’s no point to having it hanging around in la-la land.

This function does not guarantee the runtime is there when it comes time to link it. All it does is do a sanity check to see if it can find it prior to trying to link it.

I added a cheeky flag to output “assembly”. Really, it outputs pseudo-assembly for the instructions in the C runtime. It’s nice to have when you want to see the C output to verify correctness.

The balance of the changes relate to multiple files and qualifying the path. For one, we want the absolute path rather than the relative one. Also, we have to do some work when compiling a directory because the path is part of the binary name.

It also strips the file extension, if need be, so that the filename itself can be used for the intermediate and final files. A file called foo.calc will generate the files foo.c, foo.o and a binary named foo. On Windows, the binary will be called foo.exe.

Learning as We Go


In writing the compiler and these articles I came across a few minor issues with the Calc 2 spec. These were unexpected and only reared their head when actually implementing them.

Making the return type of a function mandatory was one example. Since the Calc 2 spec doesn't have anything that doesn't require a side effect it made no sense to be able to call a function in an imperative form. That is, a function which executes the expressions in the body but does not return any result. The result of such a function would be what’s called a no-op.

A no-op is an operation with no effect. Look at this example:

(decl incr (n int) (= n (+ n 1)))
(decl main int (
(call incr(2)))


What does this program do? The function ‘incr’ does not return a value. Essentially, it does nothing. Yes, the value of ‘n’ is incremented within ‘incr’ but nothing is actually produced as output. No value is returned.

So why did I start with an optional return value in the first place? Well, because in the future, perhaps even Calc 3, I hope to include external libraries. I would like for there to be a print function so that an imperative function could produce output without a side-effect, without a return value.

Still keeping with functions, I had to revert my original idea of allowing nested function declarations. I may enable this again in future versions but it was problematic. For one, you could assign a declaration to a variable but there are no pointers in the language. So, what would that mean? What would it do?

I could have allowed declarations just within other declarations but it likely would have meant a bunch more code to make work and I wasn't sure it wouldn't feel clunky. So, if I couldn't do it right, I better not do it at all.

Those are a couple examples of how I had to tweak the spec slightly as I wrote. Unintended consequences of my decisions.

Again, I was not completely re-writing the spec. I was clarifying and tweaking it slightly. Any larger of an adjustment may have meant massive changes to the code which would be unacceptable. You need to stick to your plan!

The Future


So what do I have in store for Calc 3 and beyond?

I can’t say for certain just yet but I have a few ideas. Here’s my wishlist:

  • growable stack (high)
  • simple garbage collector (unknown; dependant on other features)
  • loops (moderate-high)
  • generated code optimizations (low-moderate)
  • library/packages(moderate)
  • assembly (moderate)
  • allocations and simple garbage collector (moderate-high)
  • objects/structures (high)

Some things, like packages, I’d very much like to do but would require a lot of engineering to get right. I think if I added it, I probably couldn't do much else.

Optimizing seems unlikely. I want to teach how language features work rather than building an efficient binary.

Closing Statement


This series was far too big and were I to do it all over again I would change a great many things.

First off, I would implement fewer features in one go. When drafting the outline of the series I did not anticipate just how big it would get. Some of the new features were very large topics and I’m not sure I did them all justice. I think I could have easily split this series into two to make it more manageable.

Multiple source file support was a bad idea. Not that adding the feature was bad in its own right but that it makes no sense with the rest of the lessons and adds unnecessary bloat to the series.

Next, I would write the C Runtime a bit differently to make it easier for the C compiler to optimize the assembly instructions. While my intentions of using pseudo-assembly were noble I don’t think they were the correct answer to the problem.

I am really not happy with the type-checking implementation. It’s sloppy and buggy.

I don't know if I would create another LISP-like language. I would consider retaining prefix notation for basic arithmetic operations but I would implement the rest of the language in a more C-like way. It’s just what makes me feel comfortable.

I am not entirely sure whether I will continue with another series, or not. I would like to, I think, provided it was much, much smaller. I think I would do minor, incremental steps from now on. With my current workload and the effort required to put this together I am, quite frankly, exhausted!

Thanks for reading and I hope you enjoyed it!

Compiler 2 Part 12: Compiler

Now, to look at code generation.

The vast majority of this file is completely new. Including newlines, this file has ballooned from 87 lines to over 490 lines!

So, where to start?

Compiler Object


May as well talk about the man behind the mask! Rather than a token.File it now has a token.FileSet.

The most important changes are the offset and curScope fields. Since we now need to declare variables in our code we also need to know what offset they are from the base pointer of the active frame. The offset field is reset each time a new function declaration is made but is retained across scopes within that declaration so that successive variables are each given a unique offset.

The curScope field holds a pointer to the current, active scope.

The Interface


CompileFile and CompileDir are fairly straight forward. Each creates a new FileSet, parses the file or files needed and tries to compile them to a C file.

Error


The description summarizes this function nicely.

Rounding Up!


The roundUp16 function has to do with alignment. When requesting memory from the stack, we want to try and keep it aligned. However, rather than go on a long discourse on alignment I'm going to save that discussion for another time.

For this function to have any meaningful use the stack should be aligned first but there’s no need in our current implementation. Our stack has 32 bit address spaces and we only deal with 32 bit variables. Therefore, the stack is always in alignment. This function will be more useful at a later date.

If you’re keen on learning about alignment prior to the next series, a couple searches will serve you well.

Offsets a Plenty


nextOffset is another utility function that is hard to understand without more context. Remember in the previous post we were dealing with ebp+0 and ebp+4? This function helps us get those numbers we’re adding by determining the next available offset.

Each variable within a stack frame has it’s own offset. This allows variables within different scopes by the same name to have their own values. This function returns the next offset to use

Scope


openScope and closeScope do almost exactly what their parser equivalents do.

compNode


Rather than return a value to optimize away the complexity of compiling, compNode has been adjusted and expanded to deal with all the new object types.

compAssignExpr


The first thing to do is to lookup the variable we’re assigning a value to and verify it actually exists.

If the object has no type, then the type is being inferred from the value being assigned. If the object has a type, we must match it with the variable being assigned to make sure everything is type-safe.

The balance of the code moves the value into the object’s offset.

compBinaryExpr


A binary expression, if you recall, is an expression with two operands. In fact, our language can have more than two operands, depending on the operation, but each individual operation only takes two.

The first thing is check to see if we can optimize the code. At the cost of potentially making a full pass through all the operands in the expression, we check if we can optimize the code into a single operation. More on that later.

If you recall from part 11, the first operand is always assigned to the ‘eax’ register. We call compNode to do that.

The next section is a bit trickier. Each variable needs to be assigned to the ‘edx’ register and then apply the operation to both eax and edx and store the result in eax. To complicate matters, depending on the type of expression, we may need to temporarily store the value stored in ‘eax’ via a call to ‘push’, compile another subexpression, then recall the value via ‘pop’ and store the result back in eax.

compCallExpr


When calling a function, we need to store any arguments being passed to the function on the stack prior to calling the function. The first argument corresponds the the first parameter, and so on.

When entering into a function call, the base pointer will be 4 bytes greater than the current value of the stack pointer because that’s where the return address (offset in our case) will be stored. So the offset starts at 4 bytes.

Again, we check to make sure that the function exists, that we not attempting to call the main function again, that what we’re calling is indeed a function and that the number of arguments matches the number of parameters.

With me so far?

Once done, we make sure the types of the arguments match the types of the parameters. If all is well, we write the arguments into the offset of the variable and increase the offset by another four bytes.

Finally, the function is called.

Notice anything odd? The function name has an underscore prepended to it. That’s because C functions, most specifically the main function, are not expected to have an underscore before their name. The C compiler will be looking for a function called ‘main’ but our language also requires a function called ‘main’. To make sure the C compiler can differentiate between the two we use the underscore.

compDeclExpr


The first thing I did was open up the scope of the function declaration and make it current.

Next, assign offsets to each of the parameters of the function, if any.

The compiler then outputs a line of C code representing the function signature. All functions have the same signature, which will look like: void _funcname(void)

Our C functions take no arguments and return no values. That’s all handled by our code.

It then counts the number of variables, parameters included, in order to allocate space for them on the stack. It then multiples that value by four (bytes).

If there were any variables or function parameters declared we then use roundUp16 to find the nearest multiple of 16. Using the enter instruction to allocate the space we compile the function body. We close off by calling the leave instruction.
If the function had no variables whatsoever we can just compile the body of the function.

Last, we make an attempt to make sure that the return value of the function matches the value returned by the function.

If all is well, the scope is returned to the previous scope.

compFile and compPackage


Both of these do the same thing. They set the current scope to be the top-level scope and start the wheels in motion.

compIdent


Compiling an identifier is pretty simple. First, we look up the identifier. If the identifier doesn’t have an object associated with it we panic. Otherwise, print out the variable using the supplied format and object’s offset.

compIfExpr


First things, first. Is the value of the condition an integer? While there is only one type of variable in our language we know that will eventually have more. Also, we want to ensure that the expression used as a condition at least has a side effect, though that should have been caught during parsing.
No matter what the node is we compile it and test to see if the result is true.

We can now open up the private scope of the if statement and process both the then clause and the optional else clause.

Close the scope when finished.

compInt


Compiling an integer is pretty straight forward. We already know that we have an integer of some kind. Try to convert the literal string into a number, report an error if one occurs, and assign it to the supplied register.

compScopeDecls


What this guy does is just provide function prototypes for all our functions. I used a handy trick in Go to defer compiling the actual implementations.

compTopScope


This function is unique to the compilation process and must only ever be called once.

As with many of our other functions, we check a bunch of conditions to ensure everything is sane.

The balance is simply outputting C code. I’ll step you through it.

We include stdio.h so we have access to the printf function. Strictly speaking, we don’t NEED to use printf but but its handy to have our program output something since it doesn’t have any built-in print functions.

We include runtime.h to access our runtime functions.

Next we have the standard C main function, initialize the stack, call our program’s main function and finally clean up the stack. Of course, C expects us to return a value and since our program has no fail conditions we always return 0 for success.

compUnaryExpr


I think it’s pretty clear what it does. It multiplies the attendant expression by -1 to make it negative.

compVarExpr


Lookup the object for the variable to get it’s offset. If the assignment has a value then evaluate the assignment expression.

countVars


Yet another pretty straight-forward function. We simply cycle through all the elements in a function implementation to count up how many parameters and local variables were declared.

compTryOptimizeBinaryOrInt


I had to do a bit of thinking to get this one right. Originally, I wasn’t going to include any optimization in Calc 2 but I decided that since I had it in Calc 1 it would be only fair to have it in Calc 2.
Unlike Calc 1, we now have to deal with things that are not numbers we can readily deal with. The only way we can optimize is if we have a binary expression that consists of only numbers. This includes all sub expressions.

The compiler is less efficient as a result because it may visit the same sub-expression multiple times but at the advantage that our compiled code will be a bit faster.

So this function returns the result of the calculation and a boolean indicating whether the value is optimized or not. The value is ignore if false is returned.

matchTypes


Our final function! This method gets the type of each node passed to it. If the type is unknown, invalid or doesn’t match then an error is generated.

Types


typeOf  is separated out of comp.go. It consists of a collection of little utility functions to get the type of an object or expression. It is also bugged. It does not do proper escape analysis and determine the value of sub-expressions. I have deliberately not fixed this issue as it is just too much to deal with in this series.

The only function I feel is worth talking about is also the shortest.

validType verifies that any type is equal to “int”. That’s because it’s the only type in our language. As more types are added, more type names will need to be checked. Extra work will be involved when, or if, we allow custom type names within the language.

Adieu


That's it for now! Stay tuned for the final installation of this mega series!

Sunday, 21 December 2014

Compiler 2 Part 11: Using the Runtime

The compiler, like the parser, has seen some major changes. The largest change comes by way of the use of the C runtime and the pseudo-assembly calls now emitted.

To follow the code in the compiler I fear we have to delve back into assembly, the assembly-like runtime instructions, and theory.

Basic Arithmetic


To understand how the code from the compiler works I need to cover a little more assembly.

Consider adding the numbers two and three. On a piece of paper, you can just write “2+3=5” and be done with it. You might be tempted, therefore, to transpose that into C or whatever intermediary language you’re using.

It’s not always that easy. For one, you might not get the entire equation at once. For another, how do you handle variables? Or variables which point to another expression?

In assembly, and the C runtime, we’ll use the registers to hold our values prior to doing the actual arithmetic.

setl(2, eax);
setl(3, edx);
addl(edx, eax);


The result of the addition is stored in the eax register so that subsequent operands can easily be added. To extend the previous example, consider 2 + 3 + 4 + 5:

setl(2, eax); /* eax is 2 */
setl(3, edx);
addl(edx, eax); /* eax is now 5 */
setl(4, edx);
addl(edx, eax); /* eax is now 9 */
setl(5, edx);
addl(edx, eax); /* eax is now 14 */


Notice a pattern? After the first operand is assigned to the eax register, all subsequent operands are only ever assigned to the edx register since the eax register always holds the result of the last instruction. We can use this pattern to our advantage when compiling our code.

Nesting Arithmetic Operations


What about nested expressions like this: 1 + (5 - 2)?

If we use the above pattern, we have a problem because each nested expression starts the process anew. Since the first operand is always assigned to the eax register, the value of 1 stored in it would be overwritten by the value 5.

setl(1, eax);
/* now the inner expression, (5 - 2) */
setl(5, eax); /* uh oh */
setl(2, edx);
subl(edx, eax); /* eax is now 3 */
/* now what? The 1 is lost! */
addl(edx, eax); /* result is 5, which is clearly wrong */


Remember, we’re not writing this code by hand. If we were, we could be clever in our use of the registers.

Maybe we could ensure we run the nested expression first? It’s still not that simple. What if you have two nested expressions at the same level. Since we’re not writing this by hand we need an algorithm or some other technique to solve our problem.

Enter the push and pop instructions. We can temporarily store intermediate values on the stack.

setl(1, eax);
pushl(eax); /* store the value in eax on the stack */
setl(5, eax);
setl(2, edx);
subl(edx, eax); /* eax is now 3 */
movl(eax, edx); /* move the value of 3 stored in eax to edx */
popl(eax); /* store the last value pushed (1) in eax */
addl(edx, eax); /* eax is now 4 */


Viola!

Note that the value of 1 was originally stored in the eax register before it was pushed. In order to preserve the order of the values (1 was the first operand in the addition expression) we have to first move the value of eax to edx and then pop the stored value back into eax after we’re done handling the expression of higher precedence. This is to preserve the order of the operands.

This is pretty hard, I know.

Another thing I’d like to bring to your attention is more of a curiosity than anything. If you recall in Part 9 I said that the stack was more elastic than a standard stack data structure. Push and pop both move the stack pointer, increasing and decreasing the size of the active stack frame. The frame size is only frozen during a further function calls or the frame exits.

While you should probably always pop any information you push, strictly speaking it doesn't actually matter in the end. The location of the stack pointer doesn't matter when the function goes to exit because the leave instruction automatically moves it to the location of the base pointer. Anything you pushed, but haven’t popped, is lost and becomes unreachable.

Writing By Hand and Learning by Example


I spent a lot of time examining the assembly output from GCC and writing out assembly by hand for the expressions I’d created.

I have pages and pages of notes where I worked out the order of instructions to make sure I got the correct end result. I computed everything, step by step, by hand. I wanted to ensure its correctness before ever laying down a piece of code.

With GCC you can use the -S option to output assembly. Ensure that most optimizations are turned off using the -O0 option. You may have to use variables to view how equations are calculated because even with optimizations turned off, simple arithmetic is still optimized out.

Stack Frame


When entering a function it may have parameters and local variables that are declared. Let’s look at a simple function to add two numbers:

(decl add (a b int) int (+ a b))

Let’s assume that both the stack pointer (ESP register) and the base pointer (EBP register) both point at the bottom of the stack. Call it index 0 (zero).

The declared function ‘add’ takes two integer arguments and has no local variables. That means we need at least 8 bytes of stack space. To reserve that space, we need to call the enter instruction.

enter(8);

Eight bytes are now reserved for the add function.

If you recall from Part 10, when enter is called the current location of the base pointer is stored on the stack and its position is moved up by four bytes. The ebp register is now at index 4 and the esp register is still at index 0. The stack pointer is then moved to the base pointer and moved a further 8 bytes up the stack, making its resting place index 12.

Hopefully, an illustration will help:

Step 1: original positions of stack and base pointers
0 <- base pointer, stack pointer
4
8
12

Step 2: store address (index) of base pointer
0 <- base pointer, stack pointer
4
8
12

Step 3: move the base pointer up 4 bytes
0 <- stack pointer
4 <- base pointer
8
12

Step 4: move stack pointer to base pointer
0
4 <- base pointer, stack pointer
8
12

Step 5: increase stack pointer by 8 bytes, as requested via call to enter()
0 <- original base pointer value; the return value, is stored here
4 <- base pointer
8
12 <- stack pointer

We have now established the active frame. It consists of the addresses between the base pointer and the stack pointer, bytes four through eleven (a total of 8 bytes).

The data for the parameters ‘a’ and ‘b’ have already been stored at the bottom of the frame by some code you haven’t seen, yet. Just assume that it’s true. Since integers are four bytes we can use this knowledge to retrieve these values.

By moving the base pointer by a relative amount we can retrieve these stored values. The first four bytes are located in the bytes between ebp+0 and ebp+3. That is the variable ‘a’. The second four bytes are located between ebp+4 and ebp+7. That’s variable ‘b’.

0
4 <- base pointer, bottom of the active frame & location of ‘a’ variable (ebp+0)
8 <- location of ‘b’ variable (ebp+4)
12 <- stack pointer, top of the active frame

Knowing that, we can use the knowledge we have for doing arithmetic to compute the answer!

movl(ebp+0, eax);
movl(ebp+4, edx);
addl(edx, eax);


Finally, we call the leave instruction to return the base and stack pointers back to their previous locations. In our contrived example, they’d both be returned to index zero, the bottom of the stack.

Note that we don’t have to use ebp+0. We could just use the ebp pointer. The +0 is an artefact of the code used to generate the C output, as you will see later. I used the same code here as the compiler would emit for continuity.

The whole function will look like this:

enter(8);
movl(ebp+0, eax);
movl(ebp+4, edx);
addl(edx, eax);
leave();


That’s all there is to it!

No Return Value?


Actually, there is. The eax register holds the return value of the function. Since that was set when we did the addition there’s no more work to be done!

Storing Parameters


Keeping the above in mind we can now look at storing parameters when calling functions. The function call itself is nothing special, it looks just like a standard C function call. That said, storing the parameters is pretty interesting.

Before calling a function, we need to store any parameters we need to pass to it. Well, we know where those parameters will be stored: at the bottom of the stack frame. However, at this point, that frame hasn't been created yet; so what gives?

Based on the knowledge we already have we actually know where to put them. Remember that the stack pointer is already at the top of the current stack frame? Well, that’s our ticket to fame!

When the enter instruction is given, we know that four bytes are used to store the current location of the base pointer. So, by utilizing the stack pointer and adding four to it, we’ll be at the location of where the base pointer will be in the forthcoming frame for the function call!

(add 2 3)

setl(2, esp+4); /* ebp+0 in the next frame */
setl(3, esp+8); /* ebp+4 in the next frame */
/* call the add function */


Clear as mud? How about an illustration to compare with the ones given above.

0 <- stack pointer (esp)
4 <- location of ‘a’ parameter (esp+4/ebp+0)
8 <- location of ‘b’ parameter (esp+8/ebp+4)
12

Bare in mind that it is vitally important that when you are storing parameters that you do nothing else prior to calling the function!

EAX and EDX


There are four, standard, general purpose registers that all x86 architectures have as discussed in the section on assembly.

We only really need two of those registers for the vast majority of our code. The question you might be wonder is, why only eax and edx and not the others?

I'm not aware of any specific reason other than a historical one. I’d be happy to be proved wrong! It may be that the eax register has been optimized in some way to make accumulating data faster or easier. Of that knowledge I am, unfortunately, woefully ignorant.

Regardless of that, it just makes sense to follow suit to the intended purpose of the registers. If anything, it makes looking at the generated code much easier for anyone else familiar with assembly. They'll expect those registers to be used.

I can tell you that in assembly, eax always stores the return value of the function. In C, main must always return a value. This value is passed to the OS when the program exits and that value must be stored in the eax register.

So other than consistency and familiarity I have no other good reasons for you. There are no rules that I know of that specify you MUST use those registers. What I will caution you on is that you would be breaking an established norm which may or may not matter.

On to Compiling!


You’ll finally get to see this code in action in the next section.

If you've gotten this far the next, and final, stage shouldn't be too hard to follow.