Friday, 10 October 2014

Compiler 2 Part 2: Language Specification

Part 1 gave an overview of the topics that will be discussed in this blog series.

With Calc 1, I bored you with some broad definitions of a compiler and basic overviews of how everything works. Since we've already done that, we can dispense with the niceties and delve right in!

I’ll now discuss the additions needed for Calc 2 in a bit finer detail to provide us the framework to implement our new language features. Later, in Part 4, I will explain some of the decisions behind the way I implemented these features.


Depending on the language, functions can have a few different features. They might:

Have a name. Though, you could also have anonymous functions.
They may take parameters. Some languages may enforce rules about the parameters, like whether they’re optional, require a type and whether they’re variable.
A function almost always have a body but you could have an empty implementation for a stub.
They could have a return type, too.
Could be a type inof themselves, for use in function pointers, closures, etc.

Calc 2 will use the decl keyword to declare a function. Let’s look at what form it will take:

func-decl ::= “(“ “decl” identifier {arg-list} type expr “)”
arg-list ::= “(“ (ident whitespace)+ type {“,” (ident whitespace)+ type}* “)”
expr ::= expr | expr-list
expr-list ::= “(“ expr+ “)”
identifier ::= [a-zA-Z][a-zA-Z0-9]*
type ::= “int”


(decl add (a b int) int (+ a b))

Declare a function called add which takes two integer arguments, a and b, whose return type is an integer and returns the result of adding a to b.

A function declaration is special in that it may only occur at the top-level scope. This means that it can’t be nested within other expressions.

Function Call

A function is called like a binary expression is. It takes the form:

call-expr ::= “(“ identifier expr* “)”


(add 2 3)
=> 5

Call a function named ‘add’ with the arguments ‘2’ and ‘3’. It outputs the expected result of ‘5’.

Note that a function name may only contain alphanumeric characters and must start with a letter.


It’s typical to have the ability to assign a value to a variable which has already been declared. It will look like this:

assign-expr ::= “(“ “=” ident value “)”


(= a 42)

Assign the variable ‘a’ the value ‘42’.

Variable Declaration

Since functions tend to take arguments it stands to reason to have also have variable declarations. A nice, simple keyword will suffice.

var-expr ::= “(“ “var” (ident type) | (assign-expr {type})“)”
ident ::= [a-zA-Z][a-zA-Z0-9]*
type ::= “int”

Example 1:

(var a int)

Declare a variable called ‘a’ of the type int.

Example 2:

(var (= b 5))

Declare a variable called ‘b’ and assign it the value ‘5’. Since the optional type, when using an assign-expr, has been omitted the type is inferred.

Variables will have a default zero value.


To keep things simple, looping constructs like for and while will be held in reserve for Calc 3. To implement our required looping needs we’ll use recursion instead.


Everybody loves a good if statement!

if-expr ::= “(“ “if” binary-expr {type} expr {expr} “)”
operator ::= “>” | “>=” | “<” | “<=” | “=” | “!=” | “and” | “or”

If bool resolves to 1 then the first expr is executed, else, the other is executed.


(if (< a 2) int 2 a)

If ‘a’ is less than ‘2’ evaluates to true then return ‘2’ else return the value of ‘a’.

Of course, there are other constructs like elif chains and select-case. Neither will be implemented for Calc 2.

Putting it All Together

Taking the specification from our previous compiler, we can insert our new definitions.

package ::= file+
file ::= func-decl+
func-decl ::= “(“ “decl” identifier {arg-list}] type expr “)”
expr ::= assign-expr | binary-expr | call-expr | expr-list | if-expr | basic-lit | var-expr
expr-list ::= “(“ expr+ “)”
arg-list ::= “(“ (ident whitespace)+ type {“,” (ident whitespace)+ type}* “)”
assign-expr ::= “(“ “=” ident value “)”
binary-expr ::= “(“ (operator whitespace expr (whitespace expr)+ ”)”
call-expr ::= “(“ identifier expr* “)”
if-expr ::= “(“ “if” bool-expr {type} expr-list {expr-list} “)”
var-expr ::= “(“ “var” (ident type) | (assign-expr {type})“)”
basic-lit ::= integer
identifier ::= [a-zA-Z][a-zA-Z0-9]*
operator ::= [+-*/%><=] | “>=” | “<=” | “!=” | “and” | “or”
whitespace ::= [ \t\n\r]+
integer ::= digit+
digit ::= [0-9]
type ::= “int”

Calc 2

We can now write the two functions fibonacci and factorial like this:

(decl fibonacci (n int) int (
    (if (<= 0) int 0)
    (if (== n 1) int
        (+ (fibonacci (- n 1)) (fibonacci (- n 2)))))

(decl factorial (n int) int (
    (if (<= n 0) int 0)
    (if (== n 1) int
        (* n (factorial (- n 1))))))

Amusingly, aside from multiplication vs. addition, the two functions are exactly the same.

I think you can agree that Calc will be a bit more useful in it’s next incarnation even though it still has a long way to go to be a proper, general purpose language.

That about sums up the new language additions in Calc 2. I think that we’ll discuss some of the architectural changes that will need to come to support the user-facing features.