Tuesday 23 December 2014

Compiler 2 Part 12: Compiler

Now, to look at code generation.

The vast majority of this file is completely new. Including newlines, this file has ballooned from 87 lines to over 490 lines!

So, where to start?

Compiler Object


May as well talk about the man behind the mask! Rather than a token.File it now has a token.FileSet.

The most important changes are the offset and curScope fields. Since we now need to declare variables in our code we also need to know what offset they are from the base pointer of the active frame. The offset field is reset each time a new function declaration is made but is retained across scopes within that declaration so that successive variables are each given a unique offset.

The curScope field holds a pointer to the current, active scope.

The Interface


CompileFile and CompileDir are fairly straight forward. Each creates a new FileSet, parses the file or files needed and tries to compile them to a C file.

Error


The description summarizes this function nicely.

Rounding Up!


The roundUp16 function has to do with alignment. When requesting memory from the stack, we want to try and keep it aligned. However, rather than go on a long discourse on alignment I'm going to save that discussion for another time.

For this function to have any meaningful use the stack should be aligned first but there’s no need in our current implementation. Our stack has 32 bit address spaces and we only deal with 32 bit variables. Therefore, the stack is always in alignment. This function will be more useful at a later date.

If you’re keen on learning about alignment prior to the next series, a couple searches will serve you well.

Offsets a Plenty


nextOffset is another utility function that is hard to understand without more context. Remember in the previous post we were dealing with ebp+0 and ebp+4? This function helps us get those numbers we’re adding by determining the next available offset.

Each variable within a stack frame has it’s own offset. This allows variables within different scopes by the same name to have their own values. This function returns the next offset to use

Scope


openScope and closeScope do almost exactly what their parser equivalents do.

compNode


Rather than return a value to optimize away the complexity of compiling, compNode has been adjusted and expanded to deal with all the new object types.

compAssignExpr


The first thing to do is to lookup the variable we’re assigning a value to and verify it actually exists.

If the object has no type, then the type is being inferred from the value being assigned. If the object has a type, we must match it with the variable being assigned to make sure everything is type-safe.

The balance of the code moves the value into the object’s offset.

compBinaryExpr


A binary expression, if you recall, is an expression with two operands. In fact, our language can have more than two operands, depending on the operation, but each individual operation only takes two.

The first thing is check to see if we can optimize the code. At the cost of potentially making a full pass through all the operands in the expression, we check if we can optimize the code into a single operation. More on that later.

If you recall from part 11, the first operand is always assigned to the ‘eax’ register. We call compNode to do that.

The next section is a bit trickier. Each variable needs to be assigned to the ‘edx’ register and then apply the operation to both eax and edx and store the result in eax. To complicate matters, depending on the type of expression, we may need to temporarily store the value stored in ‘eax’ via a call to ‘push’, compile another subexpression, then recall the value via ‘pop’ and store the result back in eax.

compCallExpr


When calling a function, we need to store any arguments being passed to the function on the stack prior to calling the function. The first argument corresponds the the first parameter, and so on.

When entering into a function call, the base pointer will be 4 bytes greater than the current value of the stack pointer because that’s where the return address (offset in our case) will be stored. So the offset starts at 4 bytes.

Again, we check to make sure that the function exists, that we not attempting to call the main function again, that what we’re calling is indeed a function and that the number of arguments matches the number of parameters.

With me so far?

Once done, we make sure the types of the arguments match the types of the parameters. If all is well, we write the arguments into the offset of the variable and increase the offset by another four bytes.

Finally, the function is called.

Notice anything odd? The function name has an underscore prepended to it. That’s because C functions, most specifically the main function, are not expected to have an underscore before their name. The C compiler will be looking for a function called ‘main’ but our language also requires a function called ‘main’. To make sure the C compiler can differentiate between the two we use the underscore.

compDeclExpr


The first thing I did was open up the scope of the function declaration and make it current.

Next, assign offsets to each of the parameters of the function, if any.

The compiler then outputs a line of C code representing the function signature. All functions have the same signature, which will look like: void _funcname(void)

Our C functions take no arguments and return no values. That’s all handled by our code.

It then counts the number of variables, parameters included, in order to allocate space for them on the stack. It then multiples that value by four (bytes).

If there were any variables or function parameters declared we then use roundUp16 to find the nearest multiple of 16. Using the enter instruction to allocate the space we compile the function body. We close off by calling the leave instruction.
If the function had no variables whatsoever we can just compile the body of the function.

Last, we make an attempt to make sure that the return value of the function matches the value returned by the function.

If all is well, the scope is returned to the previous scope.

compFile and compPackage


Both of these do the same thing. They set the current scope to be the top-level scope and start the wheels in motion.

compIdent


Compiling an identifier is pretty simple. First, we look up the identifier. If the identifier doesn’t have an object associated with it we panic. Otherwise, print out the variable using the supplied format and object’s offset.

compIfExpr


First things, first. Is the value of the condition an integer? While there is only one type of variable in our language we know that will eventually have more. Also, we want to ensure that the expression used as a condition at least has a side effect, though that should have been caught during parsing.
No matter what the node is we compile it and test to see if the result is true.

We can now open up the private scope of the if statement and process both the then clause and the optional else clause.

Close the scope when finished.

compInt


Compiling an integer is pretty straight forward. We already know that we have an integer of some kind. Try to convert the literal string into a number, report an error if one occurs, and assign it to the supplied register.

compScopeDecls


What this guy does is just provide function prototypes for all our functions. I used a handy trick in Go to defer compiling the actual implementations.

compTopScope


This function is unique to the compilation process and must only ever be called once.

As with many of our other functions, we check a bunch of conditions to ensure everything is sane.

The balance is simply outputting C code. I’ll step you through it.

We include stdio.h so we have access to the printf function. Strictly speaking, we don’t NEED to use printf but but its handy to have our program output something since it doesn’t have any built-in print functions.

We include runtime.h to access our runtime functions.

Next we have the standard C main function, initialize the stack, call our program’s main function and finally clean up the stack. Of course, C expects us to return a value and since our program has no fail conditions we always return 0 for success.

compUnaryExpr


I think it’s pretty clear what it does. It multiplies the attendant expression by -1 to make it negative.

compVarExpr


Lookup the object for the variable to get it’s offset. If the assignment has a value then evaluate the assignment expression.

countVars


Yet another pretty straight-forward function. We simply cycle through all the elements in a function implementation to count up how many parameters and local variables were declared.

compTryOptimizeBinaryOrInt


I had to do a bit of thinking to get this one right. Originally, I wasn’t going to include any optimization in Calc 2 but I decided that since I had it in Calc 1 it would be only fair to have it in Calc 2.
Unlike Calc 1, we now have to deal with things that are not numbers we can readily deal with. The only way we can optimize is if we have a binary expression that consists of only numbers. This includes all sub expressions.

The compiler is less efficient as a result because it may visit the same sub-expression multiple times but at the advantage that our compiled code will be a bit faster.

So this function returns the result of the calculation and a boolean indicating whether the value is optimized or not. The value is ignore if false is returned.

matchTypes


Our final function! This method gets the type of each node passed to it. If the type is unknown, invalid or doesn’t match then an error is generated.

Types


typeOf  is separated out of comp.go. It consists of a collection of little utility functions to get the type of an object or expression. It is also bugged. It does not do proper escape analysis and determine the value of sub-expressions. I have deliberately not fixed this issue as it is just too much to deal with in this series.

The only function I feel is worth talking about is also the shortest.

validType verifies that any type is equal to “int”. That’s because it’s the only type in our language. As more types are added, more type names will need to be checked. Extra work will be involved when, or if, we allow custom type names within the language.

Adieu


That's it for now! Stay tuned for the final installation of this mega series!

No comments:

Post a Comment