Surprisingly few instructions are needed to achieve our goal.
For those of you who may be strong C programmers you might think this runtime a little nuts. If I were compiling strictly into C I most certainly would not do it this way. The intent of this library is to mimic assembly calls using pseudo-assembly rather than output proper C code.
Important Note: If you’re not strong at C, try not to worry about the implementation of the runtime. Focus more on the instructions themselves as they relate to assembly.
Registers and Stack
The char type in C is one byte. Unlike the void type, which is also 1 byte, the char type can be de-referenced. This makes it ideal for our task.
We also want to be as efficient as possible. First, assume the largest type available to you is a 64 bit variable. Knowing that, you create a 1024 element array of an 8 byte data type. Maybe it looks something like this:
int64_t stack[1024];
Second, let’s assume you want to store a bunch of 32 bit values. Let’s say...512 of them:
for (int i = 0; i < 512; i++) { stack[i] = (int64_t) i; }
The stack size is 1024 * 8 bytes = 8192 bytes. You’ve just stored 512, 4 byte values into the stack. 512 * 4 = 2048 bytes. However, you had to access 512 elements of an 8 byte array to store them which means you’ve consumed 4096 bytes of memory to store 2048 bytes of data!
There’s no question that that is horribly inefficient. Thankfully, there’s a better way.
Enter the char type. The char type is one byte which means we can put any size of data into it provided we’re clever enough. You use 4 bytes to store a 32 bit value right? Well, it turns out we can copy an integer into four char bytes because they take up the same amount of space!
Our code then becomes:
char stack[8192];
for (int i = 0; i < 512; i++) { memmove(&stack[i*4], &i, sizeof (int32_t)); }
Now we've taken up exactly the amount of space we need to store our data without any gaps or waste.
Instructions
The rest of the runtime is concerned with instructions.
Since memory addressing and copying is a bit different than assembly I’ve split the assembly mov instruction into two: mov and set.
- mov copies the data at one address and copies it into another address.
- set copies a value into an address.
The add, div, mul, rem and sub instructions should all be straight fairly forward as are the comparison functions.
Enter and Leave
These two instructions are a bit difficult to grasp. It took me a long time to wrap my head around them. First, you need a little background.
If you recall, I said that it was easier to not use C-style function prototypes. This is because your language may not determine the type or the number of parameters until runtime. To accommodate that, we’ll use the stack to transfer data between functions and to keep track of local variables.
When you enter a function, you need to reserve space for the variables used within the function. The ast.Scope from the ast.DeclExpr object has everything we need to determine how much space we need. If there are four local variables of 32-bit integers then we need 4*4 = 16 bytes, of space.
This space is gained via a call to enter and released by a call to leave.
Enter starts by determining the offset of the current base pointer and the start of the stack. While it would be handy to simply just store it’s address (as you would in assembly) taking it’s offset is actually preferable, as you’ll see in a second. We copy that value into the space starting at the current location of the stack pointer.
Eight kilobytes of stack space is quite small so it stands to reason that we’d want to grow the stack once we run out of space. If we have to allocate a new stack, copy the data from the old to the new, then any stored memory addresses would no longer be valid. However, the offsets will remain unchanged so we can copy the memory and still retain our offsets with relative safety.
Since the stack pointer always needs to be at the top of the current frame, we increment its address by the same number of bytes as the address structure we’re using. In this case, it’s 32 bits or 4 bytes.
Now that we know the location of the old base pointer (we just stored it) we can adjust for the new frame we’re creating. Move the base pointer to point at the same location as the top of the stack (stack pointer). Then increment the stack pointer’s address by the amount of memory needed for the function. Done!
Leave works in reverse. Move the stack pointer to the base pointer since it’s the bottom of the frame. Move it down by the same number of bytes as was incremented in enter and now we can access the offset of the previous base pointer. Copy that address into the base pointer and we’re back to the previous frame!
Push and Pop
These two sister functions are pretty simple. Push first tests to see if a stack overflow will occur. If everything is okay, the value of src is copied into the stack and the stack pointer is moved up four bytes.
A stack overflow occurs when the data on the stack outgrows the available memory. This will later be addressed by attempting to grow the stack but at this point our program will simply fail.
Pop does the exact opposite. It first detects whether a stack underflow will occur and, if not, move the stack pointer down four bytes and copy the value found into dest.
A stack underflow occurs when the stack pointer would be moved below the first address in the stack. This happens if you try to pop data when there is no more data to retrieve.
Stack Memory
I think the code for the enter and leave instructions make it pretty clear why the memory in the stack is automatically allocated and freed. Once the frame pointer goes out of scope it is automatically overwritten the next time that area of the stack is needed. The old data just sits there as useless bits, not bothering anyone.
It also exemplifies the usefulness of a garbage collector. If you allocate heap memory to a variable inside a function and then that stack frame goes out of scope that memory is lost forever until the program exists. There’s no way to retrieve that address to free it unless you have a garbage collector.
C Compiler Optimizations
By virtue of how the runtime was written, there’s not much room for the C compiler to optimize the generated code. This was intentional.
What I want to make clear that if you truly intended to use C as an intermediate language you would probably do things a bit differently.
Let’s take a simple expression:
(+ 2 3)
Using the runtime, we would do something like this:
setl(2, eax);
setl(3, edx);
addl(edx, eax);
There’s not much the C compiler can do to optimize this code. We could request these functions be inlined, which does help, but that’s about the extent of things.
If I were truly wanting write this in C, and retain the use of pseudo-registers, I would instead do:
*eax = 2;
*edx = 3;
*eax += *edx;
With GCC 4.8.1 and -O2 optimization it produces (almost) one-to-one C to assembly. Were we to write ‘2 + 3’ directly, the C compiler would optimize the answer to 5. Unfortunately, the compiler can’t do that because we’re using the eax and edx pseudo-registers.
There are a couple of other changes I could make to allow the C compiler to generate even more efficient code. That’s not the point of this series, though.
Done For Now
That’s it for the runtime. I’ll explain how to actually use the runtime in the next few articles.
It’s worthy note that the runtime covers only a teeny-tiny subset of assembly. If writing code in actual assembly, there are optimizations we can make by utilizing other instructions not introduced here. Again, that’s not really the point of this series. I just want to get you on the right track!
No comments:
Post a Comment