Tuesday, 7 October 2014

Compiler 2 Part 1: Introduction to Apprentice Compiler Design

Table of Contents
  1. Introduction to Apprentice Compiler Design
  2. Language Specification
  3. Infrastructure
  4. Language Design Decisions
  5. Tokens
  6. Scanner
  7. AST
  8. Parser
  9. Assembly
  10. C Runtime
  11. Using the C Runtime (Assembly Part 2)
  12. Compiler, Code Generation
  13. Front End

Introduction


Welcome! It's been a long time coming!

This series is the second I've written on compilers. The first one can be found here and if you've not already done so, I highly recommend starting there first because this series builds upon the code and ideas started there.

Calc 2 is an evolution of the language. Calc 1 achieved my goal of being a basic calculator and a teaching tool but it leaves much to be desired for being a much more useful, general purpose tool.

While Calc 1 demonstrated how to take raw source code and compile it into a working binary, it did little to demonstrate how languages are actually transformed into machine language. Short of doing some basic arithmetic, it does little else.

I also did not answer many questions I posed in the first series. Questions about different types of parsers, abstract syntax trees and language design decisions. I hope to address more of these questions while still keeping the discussion on the task at hand.

In this second series, I have set a new goal for Calc. We will give it the ability to calculate a Fibonacci number or a factorial.

So, hold on tight because here we go!

Forewarning


You need to understand the material covered in the first series before tackling anything covered here. This series takes a considerable jump in difficulty and complexity.

I’ll do my best to explain what I can but I will, like in the last series, be making some assumptions about your level of programming skill. A moderate level of competency in Go is a must.

I should also caution you that I am not a professional programmer nor am i an expert on compilers. I am merely trying to pass on some of the knowledge I’ve gained in the hopes of helping you learn how to write a compiler yourself.

Also, when we reach the topic of the C runtime and assembly you will find that having experience with pointers and understanding how memory addressing works will be an asset.

Consider yourself warned.

More Credit


Calc 2 deviates considerably further away from Go but the fact remains that there are parallels between this work and Go and it’s standard library package by the same name. As before, any and all credit must go to the Go developers for any code that resembles any of their collective works.

You can find the source code for Go at golang.org.

Runtime


To help make some of our tasks easier, I wrote a C runtime for Calc 2. This runtime is statically linked into the resulting binary. An end user need never know it even exists.

This runtime is very small and implements a tiny subset of pseudo-assembly instructions to make code generation easier. The library resides within Calc’s own directory so once Calc is purged from your system so is the runtime. It is never installed into the main OS’s system directories to lay forgotten, taking up space. Lurking...waiting...hungering…

You need never look at it if you don’t want to but you probably should.

Assembly


You got it! I’ll be talking a bit about the big, scary assembly language in this series!

Not to mislead you, because we won’t be actually using assembly, but in order to understand how the C runtime works you will need to know the strengths and advantages of assembly as it pertains to language implementation.

Language Features


With the goal of calculating a fibonacci number in mind, there are several features which will need to be added to Calc.

Calc 1 already does the arithmetic we require to make the needed calculations but it lacks the ability to actually facilitate our needs in a meaningful way.

In order to add a meaningful level of abstraction, Calc 2 will need to have the following implemented:
  • Functions
  • Variables
  • Looping
  • Branching
These four language features alone allow us to implement the design target for Calc 2.

Compiler Features


The compiler section of Calc 1 didn't actually do a whole lot. By optimizing away the complexity, Calc 1 merely prints the result of the arithmetic operations. Calc 2 will need to actually generate unoptimized code.

I will also be including the ability to compile a directory containing multiple source files.

Bugs


The potential for undiscovered bugs is always a spectre looming over every project’s shoulder. Compilers, especially, have the potential for odd and unpredictable bugs.

I have attempted to be as thorough as possible but I can’t make a 100% bug-free claim. If you discover a bug, please report it in the issue tracker.

Moving Forward


That concludes the overview of the series. In Part 2 we’ll dive right in to the language specification.

There is a lot to digest in this series. It is a considerable amount longer and a lot of new information will be introduced. I hope you bear with me and take the time to make it all the way through.

I wish you the best of luck!