Wednesday, 24 August 2011

Compiling C Programs in Linux

A lot of beginner programmers, especially those new to the Linux world, have a lot of issues compiling programs. This isn't exclusive to C, of course, so hopefully you can take something away from this post and apply it other languages, too.

There are two primary stages to "compiling" a program. The first stage is the compilation stage. There are several intermediary steps but the end result is that your code, in this case C, is translated into a machine readable language. The second stage is called "linking". This is when each part of your program and any external libraries are linked together to create an actual executable binary file you run or a library file.

Disclaimer - This post may be hazardous to your health (not really). It is a non-exhaustive and non-authoritative introduction to compiling and linking programs on the command line in Linux. I can't emphasize enough that you should reference the documentation provided by the compiler you're using and that said documentation will always trump anything contained herein. Side effects may include: nausea, dry mouth, hair loss, severe depression and momentary blindness. If any of these side effects occur, discontinue use immediately and contact your physician.

Stage 1 - Compilation:

The Compiler:

First off, you have your choice of compiler. We are going to concern ourselves with the GNU C Compiler (gcc); currently, the most widely used C compiler for Linux. LLVM has been gaining ground but we won't be discussing it here though it stands to reason much of what you learn would still be applicable. The GNU C Compiler, usually simply referred to as GCC, is actually a collection of tools for compiling programming languages. Each tool has a unique name which reflects the language it is designed for. For example, compilers exist for the following languages in addition to C: C++ (g++), Java (gcj) and Fortran (gfortran). Only some, though possibly all, may be installed on your system by default.

Important: Do NOT use g++ to compile C language sources. There's a reason why they are separate compilers.

In the case of C, make sure you have a GCC installed on your system. Open a terminal and type:

gcc --version

If you get an error of some kind, chances are you don't have GCC installed and will need to do so before you can go any further.

Hint: For those of you running a Linux system like Arch, Debian, Fedora or Ubuntu you can install any of the tools discussed within this document quite easily. For example, Debian based systems usually have a 'build-essential' target via apt-get for installing commonly used tools for building programs from source. You can simply issue the command 'apt-get install build-essential' without the quotesCheck your specific distribution's instructions for installing these tools from their respective repositories.

Everything in this how-to should be run in a terminal. Make sure you change to the directory where your source files are contained and execute the supplied commands within than directory. There are many, many compiler flags to know but the following should be considered a minimum:

gcc -Wall -pedantic -std=c99 my_file.c -o my_program

...replacing "my_file.c" and "my_program" with the proper names of your file(s) and desired program name.

Compiling a source file with the above flags will produce a final binary, skipping the separate linking stage. However, as your programs get more complex it becomes advantageous to build intermediary objects first then link them together later.

Compiler Flags:

These flags are passed to the compiler to tell it what you want. There are some very important ones to know:

-std=c99 - This flag sets the specific standard for gcc to comply with. At the time of writing gcc defaults to a standard called gnu89, which is the c89 standard with GNU extensions. If you plan on writing a program which may be compiled on a system which does not use gcc as it's C compiler (LLVM, MS Visual Studio, Borland C, etc) then it is probably in your best interest to force the use the most current C standard and disable the GNU extensions. gcc is not 100% c99 compliant but the non-compliant features are rarely used. If you require any of the features of c99 which have not yet been implemented in gcc then you'll have to use another compiler anyway.

-o - Specify an output name for the object or binary. In the most basic case, you use it to specify the name of the program you are producing. If you don't use this flag, gcc will default to using the name of the source file and create a .o file for an intermediary object (myfile.c becomes myfile.o) or, in the case of creating a binary, it will default to a.out for the binary file.

Compiler Warnings:

Warnings should always be turned on. Here are some very important/common ones to know:

-Wall - all warnings; this is misleading because it doesn't actually turn on ALL warnings but just the most commonly desired ones.

-Wextra - turns on extra, more strict warnings.

-ansi - specifies that you want to adhere strictly to the C standard. It turns off all GNU extensions to the C language making it fully ANSI compliant. This is important for portability between compilers. This is automatically turned on when you specify the standards c89 or  c99 with the -std flag. However, as stated earlier, gcc defaults to gnu89 (c89 with GNU extensions) and the -ansi flag will explicitly disable these extensions. -std=gnu89 -ansi is equivilent to -std=c89.

-pedantic - makes ANSI warnings fatal, meaning that they'll be reported as errors instead or warnings and halt compilation. It is usually a good idea to always enable this flag when you give -ansi or -std=c89/99.

Extra Compiler Flags:

-c - compile object code but do not link. This produces a file with the .o extension. Object files, those ending with .o, are later linked together to create a final binary or library.

-I <directory> - This flag allows you to specify an additional location for the compiler to find your header (.h) files by replacing <directory> with the location of the headers being searched for. This is useful if you store your header files in a directory other than the one your regular source (.c) files are located. You can chain as many of these flags together to add as many directories as you need. If you don't know why you would need to use this then it's safe to just leave it out.

Stage 2 - Linking:

The Linker:

As noted earlier, building a program is a two step process. The second step, after compiling, is linking. In most, if not all, large projects sources are usually compiled first into object files. On their own, they do nothing and are essentially useless. In order to work, they need to be linked together to create a single binary file, an actual program, then made executable. This is where the linker, named 'ld', comes in. The linker has its own set of flags which may either be passed to gcc or to ld itself. It is probably easier, and in your best interest, to issue the flags to gcc for simplicity's sake.

Linker Flags:

-l<library> - where library is the name of the external library you need to link into your program, sans (minus) the 'lib' prefix. In other words, if you with to link the math library, libmath into your program, you drop the 'lib' part and use: -lmath. Actually, the linker can find libraries with a basic regex so you can link -lm for the math library.

Extra Linker Flags:

-static - Used to force libraries linked to your program statically. This means that the library itself is incorporated (bolted on) into your program. This increases the size of your program but does away with some of the issues associated with dynamic linking. A discussion on dynamic vs. static linking is WAY beyond the scope of this article.

-dynamic - Force libraries to be dynamically linked to your program. This means that the library is linked to your program but not actually loaded until the program is run.

-L <directory> - Specify a path to find extra/custom libraries by replacing <directory> with the location of the libraries being searched for. This can be used to link external libraries not in installed in the normal paths searched by the linker. It is also used to specify directories within your project if you are linking to internal libraries.

The Next Steps:

For compiling very small programs it is usually easiest just to run gcc from the command line. As programs get larger, though, it usually becomes necessary to remove some of the complexity. The next step would be to create a custom Makefile to build your program. You may then simply issue a single command, make, and the rest of the work is done for you. I mentioned earlier that it becomes simpler to compile sources into objects first then link them later. That is because as a project gets larger the longer it takes to compile and link. By utilizing a Makefile only files which have been modified are recompiled and then re-linking thereby speeding up the build process when changes are being made. This is especially helpful when debugging.

The next step would probably to use a full build system like the GNU autotools. Autotools is a general term for a suite of programs: aclocal, autoconf, autoheader, automake, autopoint and libtool. GNU autotools provides a method of making your code more portable and easier to distribute. They can even roll a tarball for you and compress it. By utilizing tools like GNU gettext and GNOME's intltool you can integrate translations into your project, too. The autotools are the backbone of other distribution methods, as well, like creating an .rpm or .deb in Linux and knowing how to use them tends to be an essential skill. There is plenty of help on the Internet on using GNU's autotools. Just do a web search and you'll find plenty of help.

If you're having trouble, read: How to Get Help