Compilation Process of a C Program
In everyday terminology, converting a C program into an executable file is simply called “compilation”, and the tool is called a “compiler”. However, there’s a lot more that goes on under the hood. The whole magical process of converting the code to something that can run on the ECU is managed by the build system. It involves many steps and several tools depending on the target and the development process needs.
The entire build process can be controlled and automated using build systems like Make. These can be further extended by using sophisticated build system generators like CMake. In this article, we will look at only the part where the source file gets converted into an executable file.
Usually, the chip vendor provides you with compilation tools. However, the chip can also be supported by other vendors. In our case, we are looking at the Tasking compiler for Infineon’s Aurix TriCore platform. The compilation process is done in three significant steps, compilation, assembly and linking. These are done by three separate tools, the Compiler, the Assembler and the Linker. First, let’s look at the process in some detail.
Compilation
During the compilation of a C program, the C compiler runs through several steps divided into two phases: the frontend phase and the backend phase.
Frontend phases
1. The preprocessor phase:
The preprocessor includes files and substitutes macros by C source. It uses only string manipulations on the C source. The syntax for the preprocessor is independent of the C syntax but is also described in the ISO/IEC 9899:1999(E) standard.
3. The scanner phase:
The scanner converts the preprocessor output to a stream of tokens.
3. The parser phase:
The tokens are fed to a parser for the C grammar. The parser performs a syntactic and semantic analysis of the program and generates an intermediate representation of the program. This code is called MIL (Medium level Intermediate Language).
4. The frontend optimization phase:
Target processor independent optimizations are performed by transforming the intermediate code.
Backend phases
1. Instruction selector phase:
This phase reads the MIL input and translates it into Low-level Intermediate Language (LIL). The LIL objects correspond to a processor instruction, with an opcode, operands and information used within the C compiler.
2. Peephole optimizer/instruction scheduler/software pipelining phase:
This phase replaces instruction sequences with equivalent but faster and/or shorter sequences, rearranges instructions and deletes unnecessary instructions.
3. Register allocator phase:
This phase chooses a physical register to use for each virtual register. When there are not enough physical registers, virtual registers are spilled onto the stack. Intermediate results of any optimization can live, for some time, on the stack or in physical registers.
4. The backend optimization phase:
Performs target processor independent and dependent optimizations which operate on the Low-level Intermediate Language.
5. The code generation/formatter phase:
This phase reads through the LIL operations to generate assembly language output.
The backend part is not called for each C statement but starts after a complete C module or set of modules has been processed by the frontend (in memory). This allows better optimization.
The C compiler requires only one pass over the input file which results in a relative fast compilation.
Assembly
The Assembler converts hand-written or compiler-generated assembly language programs into machine language, resulting in object files in the ELF/DWARF object format.
Its main objectives are
• Instruction grouping and reordering
• Optimization (instruction size and generic instructions)
• Generation of the relocatable object file i.e. a *.o
and optionally a list file
Linking
The linker phase combines relocatable object files *.o
files, generated by the Assembler, and libraries into a single relocatable linker object file *.out
.
The locator phase assigns absolute addresses to the linker object file and creates an absolute object file which you can load into a target processor.
This is where the *.elf
file is generated.
It is the Linker that outputs the other useful files of the build process like the *.map
, *.mdf
This is based on the Tasking compiler user manual. Refer to your compiler vendor's documents for details on the process and available tools. All the documents for the Tasking toolchain can be found here.
Get the full resolution image of the compilation process here in png
.
Let me know your thoughts on this.