Saturday, 3 May 2014

Compiler Part 2: Compiling, Transpiling and Interpreting

Part 1 served as an introduction to this series.

In this second post, I want to do an overview of some definitions before we dive in to the actual steps of compiling.


Compiling


Compiling is the act of taking code written in one language and translating it into a lower level language. A C compiler doesn't usually output machine code directly. Instead, it translates the C source code into assembly. The assembler takes that output and compiles it into machine code. C# and Java are translated into bytecode. Bytecode isn't transformed into machine code until it is executed by the virtual machine.

It’s important to understand this distinction.

Compiling often consists of using an intermediate representation (IR) or intermediate language. Assembly is a common intermediate language. LLVM has an IR imaginatively called LLVM IR. C can also act as an intermediate language.


Transpiling


Transpiling, by contrast, is transforming code from one language to another language of equal or higher level. Translating Go to Javascript is an example of this.

Do not confuse this with what languages like Scala or Clojure do. Both of these languages are compiled directly to Java bytecode and use the JVM to execute their code. Neither are considered to be transpiled because Java bytecode is a lower level of abstraction. Were they translated into Java first prior to being compile to bytecode then they might be considered transpiled languages.

Translating Go or Lisp to C isn't really transpiling either, though it rides a fine line. C is a lower level language although it is certainly a higher level language than assembly.


Interpreting


Interpreters are usually associated with scripting. Rather than translating a language into another language or IR the language is, typically, interpreted directly.

In some cases this means, quite literally, scanning the code and interpreting it on the fly and halting the moment something isn't right. In other cases, it means scanning through the entire code, validating it and then executing it from the internal tree data structure holding all the information.

In a sense, an interpreter of this kind must “compile” the script or source code each time it is run, making it much slower. This is because it needs to do all the steps a compiler does each time the script is executed rather than doing that process only once.

Many modern interpreters don’t typically do this, however. You may want to investigate Just In Time (JIT) compiling if you’re interested in the details. The short and simple version is that the script is interpreted only once, just like a compiler, and is stored in some kind of intermediate form so it can be loaded faster. This process also allows the designers to incorporate optimizations to further speed up code.

Binaries, Compiling


The objective of compilers is often to create an executable binary, or at least some kind of object  the machine can read and execute.

It’s not so simple to just create machine code. Compiling is actually just the first step.

Here are the basic steps that GCC and Go might use to compile their source code:

C using GCC:


  1. Translate C into assembly via GNU C Compiler (gcc);
  2. Translate assembly into machine code via GNU Assembler (gas);
  3. Link machine code with standard library and produce an architecture specific binary via the GNU Linker (ld).


Go using Go Compiler on x86-32:


  1. Translate Go into assembly via Go compiler (8g);
  2. Translate assembly into machine code via Plan 9 Assembler (8a);
  3. Link machine code with standard library and produce an architecture specific binary via the Plan 9 Linker (8l).

As you can see, the compiler is just one step of the process. In these articles we’ll only be dealing with step 1. Assembling and linking are well beyond the purpose of these articles.

Don’t fret though! Our compiler will produce an executable binary!


Next


Now that we've taken care of some definitions and understand the process to making executable code we can look at a compiler in a little more detail in Part 3.