No Effing Clue: October 2014

Saturday, 25 October 2014

Compiler 2 Part 6: Scanner

Much like tokens, the scanner hasn't changed much but mainly gained some new functionality. This should be a short but sweet post.

Scan

To scan for an identifier, we detect if the current character in the scanner is a letter of some kind. This is per our specification for an identifier. If a letter was found, we scan for an identifier of some kind and return the result.

A small change to the next section allows us to scan two characters at a time for our comparison operators. Consider assignment verses equality. One operator has a single equals sign and the other a double equals sign. After storing the current character in a temporary variable, the scanner is advanced again so we can look ahead by one character.

Once we find a valid token and determine that it might be one of two different values, we call selectToken to determine which token we want. Since the scanner has already advanced to the next character, we test to see if it matches a second character. If it matches, we return the first token and advance the scanner again. Otherwise, we return the second token.

The next function had a bug in Calc 1. A newline was never recorded as a result of the bug so the Position of a character was never properly reported. Since most of the source code passed to the Calc 1 scanner was always on a single line the bug wasn’t caught until after the compiler was released.

The bug has since been fixed and back-ported to Calc 1.

Identifiers

Like scanNumber, scanIdentifier continues to advance the scanner until the end of an identifier is found. As long as the first character of an identifier is a letter, the remaining characters can be a digit or a number.

The next part of the code is a little quirky. It is there to protect against the end of a file in the middle of an identifier, which is obviously an error. It also is needed when an identifier is alone on a line because it is the return value for a function.

Last, it makes a call to token.Lookup. This code, if you recall, determines whether the identifier is actually a keyword or just an identifier. That’s why we had to change the default return value of Lookup to the IDENT token, rather than ILLEGAL.

Summary

Again, not a lot to do here. There’s nothing particularly amazing in the additions.

On to parsing and the abstract syntax tree! This is where the major changes appear!

Sunday, 19 October 2014

Compiler 2 Part 5: Tokens

Finally! We get to look at some code!

In the token package, I had to add a plethora of new operators, some keywords and a few other tokens.

New Tokens

The IDENT token will be used for any type of identifier, be it a variable name or a type name.

The COMMA token is a special case which only sees use in function parameters in Calc 2. When multi-value assignment and multi-variable declarations come in to play it will see more use.

The DECL, IF and VAR tokens are representative of the three keywords by the same names.

The balance of the new tokens are all new operators. ASSIGN for variable assignment, AND and OR for the corresponding logical operators and the other six are for comparisons.

The IsKeyword function was added to complement the other boolean testing functions.

Last, and perhaps most importantly, Lookup had a seemingly minor change with further reaching consequences. It now returns an IDENT token rather than ILLEGAL when looking up a string. This is useful in the scanner when determining whether an identifier is really an identifier or if it is actually a keyword.

Errors

Errors were moved from the scanner to the token package.

It felt like a better fit, here, considering tokens are used throughout the entire system whereas the scanner is only used by the parser. It now eliminates the need for an additional, essentially unnecessary, import.

The Add function was changed to make it easier to write error messages.

There are various other refinements to the error handling that I encourage you to look at but I go into here.

Files

The complete Calc source of the file is no longer stored in the File structure. It doesn't need to be there since it’s main facility is to help with error reporting throughout the package. All the data it needs is already there and the source wasn't ever used for anything but the file size.

The length of the source code now tracked by the size field.

The base field is now supplied as a parameter in NewFile. This is because the base value of the file may no longer be one when multiple files are involved.

FileSets

A file set holds the collective file information for a package. A new file set has a base of one for the same reason a file had a base of one in the previous series.

When the Add method is called, a new file is added to the set and the base is increased by the size of the source code added.

The Position of a token is now determined by cycling through the files in a file set. If the position (of type Pos) is within the base and the base + size of the file then we know the position is in that file. We then call Position method on the file to get the Position information from it for use in error reporting.

NoPos

Every variable has a type. When inferring a type of a variable via assignment, there is no actual type keyword. So, a new ast.Ident is created with the type’s name and given a position of NoPos. Using the term NoPos is more indicative of it’s use than illegalPos used in Calc 1 so the variable was renamed.

Summary

That concludes the changes to the token package. Not much work needed to be done here and I think the changes are pretty straight forward.

Adding new tokens or keywords is incredibly simple. Some of the infrastructure to support things like multiple source files is a bit more difficult but not too bad.

Lets move on to the scanner.

Thursday, 16 October 2014

Compiler 2 Part 4: Language Design - Variables

Table of Contents

Introduction to Apprentice Compiler Design
Language Specification
Infrastructure
Language Design Decisions
Tokens
Scanner
AST
Parser
Assembly
C Runtime
Using the C Runtime (Assembly Part 2)
Compiler, Code Generation
Front End

Previously, I discussed the new language features and corresponding infrastructure that will be needed to implement Calc 2. Now I’d like to discuss language design a bit more, focusing on variable declarations.

In the first series of articles I discussed some of the thought that goes into language design. In this second series, we’re going to talk a bit more about that.

Variables

At first glance, variables are easy. A word or letter equals another value. No problem!

Not so fast!

First, you might decide what the declaration is even going to look like so we can fit it into our grammar rules.

Before anything can be finalized, we need to start asking ourselves a bunch of questions. Here’s a small sample:

How is assignment going to look?
How will variable types be declared, if at all?
What is a valid variable name?
How will I determine the value of a variable?
Should I have pointers?
What happens when I take the address of a variable and assign it to itself, creating a circular reference?
What happens when I try to access a variable that hasn’t been declared?
How should I handle variable shadowing?
Strong or weak typing?
Dynamic or static typing?

That’s only a quick few off the top of my head. The questions feel endless sometimes but you need to keep asking them!

I realize you might not know what questions to ask. I also understand you might not know what type systems there are, what variable shadowing is and how addressing works. I can’t possibly teach you everything there is to know in a blog series but I’ll do my best to get you on the right track.

I leave it as an exercise for you to do a search for any of these terms or concepts you don’t understand. It’s a deep, deep rabbit hole so be prepared!

So lets address the majority of these questions as they pertain to Calc 2.

Variable Names

Most languages restrict what kind of characters can be used for a variable name. Usually, the first character must be a letter so that the scanner doesn’t have to make a decision on whether a sequence of characters is an identifier, number or something else. The rest of the characters should be letters or numbers and probably not any symbols outside of an underscore.

It begs the question, “Why?”

Consider a number versus an identifier. Is 0xFF a number or an identifier? We know, by looking at it, it’s most likely to be intended to be an hexadecimal representation of a number but how does our scanner figure that out? How does it know 0xFF is a number but OxFF isn't?

Since a number can contain letters and identifiers can contain numbers it stands to reason that we need SOME way to differential between the two. The typical answer is to specify that a sequence of numbers and letters that start with a letter is an identifier and a similar sequence starting with a number may be a number representation of some kind.

Why no symbols? Well, let’s take a look at another example: a+b. Are we adding two variables, “a” and “b”, or is it an identifier “a+b”? As people, we can usually use context to discover the meaning of the expression but it’s much harder for a compiler. Not only that, finding the correct context might be very time consuming, even for a human.

We might, for example, think that if neither a nor b were declared as variables then it must be an identifier. Can we safely assume that or is it reasonable to believe the programmer made a mistake and meant to declare each variable but forgot to?

Since including symbols in an identifier name is outweighed by the complexity of parsing the result, it’s simpler to not include them.

Strong, Static Typing

Before talking about a variable declaration we need to take a quick detour into type systems.

If you don’t know much, or anything, about them then I encourage you to do some research. It may not be perfect but Wikipedia is a good start. You can find place to start here.

In Calc's case, I chose strong, static typing. Once a variable is a certain type then it is that type forever. If “a” is a string then it can only ever be assigned strings. It can’t be redeclared either within the same scope.

Static typing means that every variable must have it’s type set when it is declared. This can be done either through an explicit type name or through inference.

A type is said to be inferred when the type is taken from an assigned value and not via an explicit type name.

Variable Declaration

Should the type name come before or after the variable name? In C, the type comes before. In Go, it comes after. Why does it matter? Well, lets consider parsing the two following sets of code:

(int a)
(a int)

When it comes to parsing, neither really has a distinct advantage though I’d give a slight edge to option one for clarify of intent.

Let look at another set of examples. this time with the keyword ‘var’:

(var int a)
(var a int)

In this form, I actually like option two better because it looks nicer. Yes, that matters.

Look at the first option. After the var keyword the int looks out of place. Is it a keyword or a variable? Parsing-wise, it doesn't matter but this time it matters to the people who have to read it: us. To me, at least, option two is clearer. Declare a variable ‘a’ and make it of type ‘int’.

Assignment

In C based languages assignment looks like this: identifier = value. This seems sensible enough but there are other options, too:

(= a 5)
(set a 5)

As far as parsing goes, both are identical. The equal sign has the advantage of adding another operator and not introducing any new keywords. Unfortunately, it also might create ambiguity with equality, a common issue in many languages. Set does make our intentions more clear but at the cost of another keyword to parse.

Quite frankly, the cost is minimal but it’s not unreasonable to want to keep either the symbols in your language or the keywords to an absolute minimum. Languages like Scheme and Go do a good job. On the other end of the spectrum is C++, which has gotten out of hand.

I found this thread amusing. 357 keywords? That’s a level of insanity I’d never care to know.

Declaration with Assignment

It just keeps getting worse, doesn't it? First, assume that the language uses the var keyword to declare a variable and the equals sign for assignment. Here are a few different forms it could take:

(= (var a int) 5)
(var a int 5)
(var a 5) - using type inference
(var a 5 int)
(var a int = 5)
(var (= a 5))

I'm sure you could come up with more choices. So, which is the best?

That’s a tough question to answer. In my case, I decided to go with the last option and that has to do with declaring multiple variables and doing multiple assignment. Keep reading for an explanation.

Multi-Assignment

Calc 2 isn't going to have multi-assignment but it’s something to think about and you might want to add it to your own language. How might that look in Calc?

(=  a b, 1 2)
(= (a b) (1 2))
(= ( a 1) (b 2))

Just by looking at this you can see the added complexity. All three options are quite hideous and certainly gives me pause about how to add it properly.

You need to think about how a feature like this is going to fit with the rest of the language. Go’s handling of multiple variables fits like a glove in most cases. Swapping two variables is as easy as:

a, b = b, a

Looks, and feels, good. Consequently, multi-assignment from a function is equally as elegant:

a, b, c = returnsThreeResults()

How good is that? Damn fine. You can also chose to disregard certain variables from function calls with a simple underscore.

It’s clear that the Go developers thought carefully about how handle multiple assignment properly and I think they did a darned good job.

Looking to Go, I decided to use something similar.

(= a b, 1 2); assign1 to a, 2 to b
(= a b, b a); swap the values of a and b

So, when declaring multiple variables, or declaring with assignment:

(var a b int)
(var (= a b, 1 2))
(var (= a b, (returnTwoValues)))

Pointers

Calc 2 isn't going to have pointers but there’s a good chance that future versions will. I think I would make an effort to hide much of the complexity of pointers away from the user but they have their uses.

Pointers and dynamic memory allocation go hand in hand even though they’re mutually exclusive. Pointers can add a lot of complexity and the garbage collector that future versions of Calc are going to use will need to deal with that complexity.

You need to keep this in mind.

Look no further than C to see how bad pointer abuse can become. In Go, the developers decided that pointers were useful enough to include in the language but took efforts to prevent abuse. Case in point, Go doesn't have pointer arithmetic nor does Go allow casting of pointers to incompatible pointer types.

Undeclared Variables

Scoping and the symbol table allow us to detect when a variable is undeclared. Undeclared variables will result in an error in Calc 2.

You’re free to handle this however you want too. Perhaps you would prefer to create a new variable each time an undeclared variable is discovered. Bare in mind, however, that this means that if a programmer makes a typo a new variable will be created instead of warning them through a helpful error that the variable doesn't exist.

Something to think about.

Resolution

I hope I’ve demonstrated how something seemingly simple quickly becomes complex. These choices have far reaching consequences. We have to introduce new operators and keywords. We have to introduce new complexity for both ourselves and our users.

This was a fairly long post and I only covered something as simple as variable declarations!

Later on we’ll have to think about some special cases. For example, if a declaration is a valid expression then should it be allowed in variable assignment? Can you do something like (= a (var b int))? If not, how do we stop that?

I want to impress upon you the weight you bare. Seemingly simple decisions, like using LISP-like expressions, have huge impact on later decisions. Can I, for example, relegate these expressions to mathematical expressions only and use a C-like syntax for the rest of the language? Do I have to conform to encapsulating everything in parentheses?

The questions, and doubts, never end! I’ll just let you mull that over for a bit.

* Update - Fixed some typos *

Saturday, 11 October 2014

Compiler 2 Part 3: Language Infrastructure

Previously, I talked about the new language features to be implemented in Calc 2. Now, I’d like to talk about some of the thought process behind these features and the new infrastructure that will be needed to implement them.

Functions

Procedures introduce some fun new concepts.

Scope is a word we use to talk about visibility of symbols. A symbol may represent a function or variable. Scoping determines when a variable is unique to a function or whether it can be accessed lexically.

Lexical scoping is a term used to describe a scoping method where variables are accessible within child scopes. Most languages I've used have lexical scoping. It means that when a variable has been declared in a function, it is accessible anywhere within that function, including nested functions or if statements.

As the parser reads from top to bottom, it will place any function declarations within the top-most scope, usually called the global scope. When a function is called, the global scope can be checked to see if a corresponding symbol exists and then the function can be accessed.

Of course, then there will need to be an entry point. In most compiled languages, this is called the main function or object.

Variables and Assignment

Most everything needed for variable assignment will be handled by scoping, too.

A variable, in many languages, is merely a symbol which represents a value in memory. It has a type, a name and a value. When assigning a value to a variable we need to ensure their types match and that it is mutable.

Mutable, from the word mutate, is the ability to change. By contrast, an immutable object is one that cannot change.

To assign a value to a variable we search for it in the current scope. If there is a parent scope, we move up a level and search for it there, recursively repeating the process until there are either no more scopes to search or we find the correct symbol.

Looping

Recursion is fun. For most of us, our first experience with looping comes from the venerable ‘for’ and ‘while’ statements. Calc will eventually have such constructs but in Calc 2 all we have is recursion.

In the most simple terms, a recursive function is one which calls itself.

Branching

The if statement is your typical, basic branching mechanism. A branch is like a fork in the road. Depending on a specific criteria, you chose to go down either one path or the other and they are mutually exclusive.

A branch has it’s own scope. You may declare a variable inside an if statement and it will be unique to that branch and will not be accessible outside of the branch. It uses normal scoping rules.

Types

Even though Calc 2 will still consist of a single type, the infamous integer, I will be introducing static typing and type-checking.

Static typing is a type system where a type is either explicitly or implicitly set when a variable is declared at compile time. What does this mean?

It means that the type of a variable is determined before a program is ever run and can be checked for type-correctness (types match or are compatible) when the program is compiled.

By contrast, a dynamic type system checks the type of a variable at runtime (while the program is running).

As previously stated, Calc 2 still only has one type. The basic “int” type is 32 bits. It is also signed, meaning that one bit is reserved to determine whether the number is positive or negative. Be aware of these limitations if you try to calculate a number that is too large to be held in this data type.

Negative Numbers

From a human standpoint a negative number is pretty easy. It’s a dash before a number.

That’s the key. It’s dash before a number. The number itself is not negative. It’s a positive number with a dash prepended to it.

The scanner is only aware that a number is a series of uninterrupted digits; so, lexically, a negative number is a bit tougher. The dash is scanned first as a separate lexical element from the number. The parser receives both of them separately.

Enter what we call a unary expression. This type of expression has only a single operand. This is in contrast to a binary expression, which has two operands. During parsing we’ll need to remember that the dash may indicate subtraction or negation.

Multiple Source Files

Multiple sources bundled up into a single object brings some unique difficulties but is, overall, simpler than one might think.

The only real obstacle to overcome is resolving symbols which exist in other files. A function may be called that exists somewhere else or not at all. Naming collisions could also happen.

Next Up…

Language design decisions!

Friday, 10 October 2014

Compiler 2 Part 2: Language Specification

Part 1 gave an overview of the topics that will be discussed in this blog series.

With Calc 1, I bored you with some broad definitions of a compiler and basic overviews of how everything works. Since we've already done that, we can dispense with the niceties and delve right in!

I’ll now discuss the additions needed for Calc 2 in a bit finer detail to provide us the framework to implement our new language features. Later, in Part 4, I will explain some of the decisions behind the way I implemented these features.

Functions

Depending on the language, functions can have a few different features. They might:

Have a name. Though, you could also have anonymous functions.
They may take parameters. Some languages may enforce rules about the parameters, like whether they’re optional, require a type and whether they’re variable.
A function almost always have a body but you could have an empty implementation for a stub.
They could have a return type, too.
Could be a type inof themselves, for use in function pointers, closures, etc.

Calc 2 will use the decl keyword to declare a function. Let’s look at what form it will take:

func-decl ::= “(“ “decl” identifier {arg-list} type expr “)”
arg-list ::= “(“ (ident whitespace)+ type {“,” (ident whitespace)+ type}* “)”
expr ::= expr | expr-list
expr-list ::= “(“ expr+ “)”
identifier ::= [a-zA-Z][a-zA-Z0-9]*
type ::= “int”

Example:

(decl add (a b int) int (+ a b))

Declare a function called add which takes two integer arguments, a and b, whose return type is an integer and returns the result of adding a to b.

A function declaration is special in that it may only occur at the top-level scope. This means that it can’t be nested within other expressions.

Function Call

A function is called like a binary expression is. It takes the form:

call-expr ::= “(“ identifier expr* “)”

Example:

(add 2 3)
=> 5

Call a function named ‘add’ with the arguments ‘2’ and ‘3’. It outputs the expected result of ‘5’.

Note that a function name may only contain alphanumeric characters and must start with a letter.

Assignment

It’s typical to have the ability to assign a value to a variable which has already been declared. It will look like this:

assign-expr ::= “(“ “=” ident value “)”

Example:

(= a 42)

Assign the variable ‘a’ the value ‘42’.

Variable Declaration

Since functions tend to take arguments it stands to reason to have also have variable declarations. A nice, simple keyword will suffice.

var-expr ::= “(“ “var” (ident type) | (assign-expr {type})“)”
ident ::= [a-zA-Z][a-zA-Z0-9]*
type ::= “int”

Example 1:

(var a int)

Declare a variable called ‘a’ of the type int.

Example 2:

(var (= b 5))

Declare a variable called ‘b’ and assign it the value ‘5’. Since the optional type, when using an assign-expr, has been omitted the type is inferred.

Variables will have a default zero value.

Looping

To keep things simple, looping constructs like for and while will be held in reserve for Calc 3. To implement our required looping needs we’ll use recursion instead.

Branching

Everybody loves a good if statement!

if-expr ::= “(“ “if” binary-expr {type} expr {expr} “)”
operator ::= “>” | “>=” | “<” | “<=” | “=” | “!=” | “and” | “or”

If bool resolves to 1 then the first expr is executed, else, the other is executed.

Example:

(if (< a 2) int 2 a)

If ‘a’ is less than ‘2’ evaluates to true then return ‘2’ else return the value of ‘a’.

Of course, there are other constructs like elif chains and select-case. Neither will be implemented for Calc 2.

Putting it All Together

Taking the specification from our previous compiler, we can insert our new definitions.

package ::= file+
file ::= func-decl+
func-decl ::= “(“ “decl” identifier {arg-list}] type expr “)”
expr ::= assign-expr | binary-expr | call-expr | expr-list | if-expr | basic-lit | var-expr
expr-list ::= “(“ expr+ “)”
arg-list ::= “(“ (ident whitespace)+ type {“,” (ident whitespace)+ type}* “)”
assign-expr ::= “(“ “=” ident value “)”
binary-expr ::= “(“ (operator whitespace expr (whitespace expr)+ ”)”
call-expr ::= “(“ identifier expr* “)”
if-expr ::= “(“ “if” bool-expr {type} expr-list {expr-list} “)”
var-expr ::= “(“ “var” (ident type) | (assign-expr {type})“)”
basic-lit ::= integer
identifier ::= [a-zA-Z][a-zA-Z0-9]*
operator ::= [+-*/%><=] | “>=” | “<=” | “!=” | “and” | “or”
whitespace ::= [ \t\n\r]+
integer ::= digit+
digit ::= [0-9]
type ::= “int”

Calc 2

We can now write the two functions fibonacci and factorial like this:

(decl fibonacci (n int) int (
    (if (<= 0) int 0)
    (if (== n 1) int
        1
        (+ (fibonacci (- n 1)) (fibonacci (- n 2)))))

(decl factorial (n int) int (
    (if (<= n 0) int 0)
    (if (== n 1) int
        1
        (* n (factorial (- n 1))))))

Amusingly, aside from multiplication vs. addition, the two functions are exactly the same.

I think you can agree that Calc will be a bit more useful in it’s next incarnation even though it still has a long way to go to be a proper, general purpose language.

That about sums up the new language additions in Calc 2. I think that we’ll discuss some of the architectural changes that will need to come to support the user-facing features.

Wednesday, 8 October 2014

Compiler 2 Part 1: Introduction to Apprentice Compiler Design

Table of Contents

Introduction to Apprentice Compiler Design
Language Specification
Infrastructure
Language Design Decisions
Tokens
Scanner
AST
Parser
Assembly
C Runtime
Using the C Runtime (Assembly Part 2)
Compiler, Code Generation
Front End

Introduction

Welcome! It's been a long time coming!

This series is the second I've written on compilers. The first one can be found here and if you've not already done so, I highly recommend starting there first because this series builds upon the code and ideas started there.

Calc 2 is an evolution of the language. Calc 1 achieved my goal of being a basic calculator and a teaching tool but it leaves much to be desired for being a much more useful, general purpose tool.

While Calc 1 demonstrated how to take raw source code and compile it into a working binary, it did little to demonstrate how languages are actually transformed into machine language. Short of doing some basic arithmetic, it does little else.

I also did not answer many questions I posed in the first series. Questions about different types of parsers, abstract syntax trees and language design decisions. I hope to address more of these questions while still keeping the discussion on the task at hand.

In this second series, I have set a new goal for Calc. We will give it the ability to calculate a Fibonacci number or a factorial.

So, hold on tight because here we go!

Forewarning

You need to understand the material covered in the first series before tackling anything covered here. This series takes a considerable jump in difficulty and complexity.

I’ll do my best to explain what I can but I will, like in the last series, be making some assumptions about your level of programming skill. A moderate level of competency in Go is a must.

I should also caution you that I am not a professional programmer nor am i an expert on compilers. I am merely trying to pass on some of the knowledge I’ve gained in the hopes of helping you learn how to write a compiler yourself.

Also, when we reach the topic of the C runtime and assembly you will find that having experience with pointers and understanding how memory addressing works will be an asset.

Consider yourself warned.

More Credit

Calc 2 deviates considerably further away from Go but the fact remains that there are parallels between this work and Go and it’s standard library package by the same name. As before, any and all credit must go to the Go developers for any code that resembles any of their collective works.

You can find the source code for Go at golang.org.

Runtime

To help make some of our tasks easier, I wrote a C runtime for Calc 2. This runtime is statically linked into the resulting binary. An end user need never know it even exists.

This runtime is very small and implements a tiny subset of pseudo-assembly instructions to make code generation easier. The library resides within Calc’s own directory so once Calc is purged from your system so is the runtime. It is never installed into the main OS’s system directories to lay forgotten, taking up space. Lurking...waiting...hungering…

You need never look at it if you don’t want to but you probably should.

Assembly

You got it! I’ll be talking a bit about the big, scary assembly language in this series!

Not to mislead you, because we won’t be actually using assembly, but in order to understand how the C runtime works you will need to know the strengths and advantages of assembly as it pertains to language implementation.

Language Features

With the goal of calculating a fibonacci number in mind, there are several features which will need to be added to Calc.

Calc 1 already does the arithmetic we require to make the needed calculations but it lacks the ability to actually facilitate our needs in a meaningful way.

In order to add a meaningful level of abstraction, Calc 2 will need to have the following implemented:

Functions
Variables
Looping
Branching

These four language features alone allow us to implement the design target for Calc 2.

Compiler Features

The compiler section of Calc 1 didn't actually do a whole lot. By optimizing away the complexity, Calc 1 merely prints the result of the arithmetic operations. Calc 2 will need to actually generate unoptimized code.

I will also be including the ability to compile a directory containing multiple source files.

Bugs

The potential for undiscovered bugs is always a spectre looming over every project’s shoulder. Compilers, especially, have the potential for odd and unpredictable bugs.

I have attempted to be as thorough as possible but I can’t make a 100% bug-free claim. If you discover a bug, please report it in the issue tracker.

Moving Forward

That concludes the overview of the series. In Part 2 we’ll dive right in to the language specification.

There is a lot to digest in this series. It is a considerable amount longer and a lot of new information will be introduced. I hope you bear with me and take the time to make it all the way through.

I wish you the best of luck!

Saturday, 25 October 2014

Compiler 2 Part 6: Scanner

Much like tokens, the scanner hasn't changed much but mainly gained some new functionality. This should be a short but sweet post. Scan

Next

Identifiers

Summary

Sunday, 19 October 2014

Compiler 2 Part 5: Tokens

Finally! We get to look at some code!

Thursday, 16 October 2014

Compiler 2 Part 4: Language Design - Variables

Variables

Variable Names

Strong, Static Typing

Variable Declaration

Assignment

Declaration with Assignment

Multi-Assignment

Pointers

Undeclared Variables

Resolution

Saturday, 11 October 2014

Compiler 2 Part 3: Language Infrastructure

Functions

Variables and Assignment

Looping

Branching

Types

Negative Numbers

Multiple Source Files

Next Up…

Friday, 10 October 2014

Compiler 2 Part 2: Language Specification

Functions

Function Call

Assignment

Variable Declaration

Looping

Calc 2

Wednesday, 8 October 2014

Compiler 2 Part 1: Introduction to Apprentice Compiler Design

Introduction

Forewarning

More Credit

Runtime

Assembly

Language Features

Compiler Features

Bugs

Moving Forward

Much like tokens, the scanner hasn't changed much but mainly gained some new functionality. This should be a short but sweet post.

Scan