Sunday, 8 November 2015

Compiler 3 Part 4 - Type Checking

Whether you chose to produce assembly or an intermediate language, the job of the IR is to take us ever closer to generating actual output. We need to do the crucial step of type checking. Since Calc’s type system is strong and static it is necessary to ensure type safety before moving on.

Each struct in the IR tree has an underlying object. This object provides a Pos() function that maps the remaining constructs back into the original source code. This will be useful to us when we are doing type checking so we can provide well formed error messages.

We must traverse the tree and verify that expected types match. Not all objects currently have a type, like variable declarations using type inference. This means that we don’t actually know the type of the variable when parsing the source language since the type isn't explicitly declared but inferred from the object being assigned to the variable. 

Similarly, binary expressions do not have a defined type when parsing. Your language may allow adding (concatenating) strings, in which case an add binary operation on strings would have the type string. In the case of Calc, logical comparison operations make a binary expression type bool. Don't forget that most languages have different integer types so your binary expressions will need to take on and verify these types.

This requires we traverse the tree depth first to identify the types of certain structures prior to verifying their correctness.

In addition to verifying type correctness, the type-checker also does some additional tests…
  • checks that function parameters match the number of arguments in function calls;
  • verifies that an identifier matches the right kind (variable vs. function names);
  • checks for undeclared variable and function names.

While not true of the current version of Calc, many languages have varying bit sizes of integer types. Somewhere during this process you would will need to verify if a particular value fits within the bounds of a particular type. If the source has a value greater than 255 being stored into an unsigned 8 bit type, this is a problem. When and how you chose to handle this error is up to you.

Type checking may also involve a level of escape analysis. You need to verify that any value being returned anywhere in the tree conforms to the parent type. In imperative languages, this means looking for statements like “return”. For functional languages, you need to search for terminal points like the "then" and "else" clauses in an "if" expression.

Once we are sure that everything is type safe and correct, we can move on.