[compiler-construction] What is compiler, linker, loader?

=====> COMPILATION PROCESS <======

                     |
                     |---->  Input is Source file(.c)
                     |
                     V
            +=================+
            |                 |
            | C Preprocessor  |
            |                 |
            +=================+
                     |
                     | ---> Pure C file ( comd:cc -E <file.name> )
                     |
                     V
            +=================+
            |                 |
            | Lexical Analyzer|
            |                 |
            +-----------------+
            |                 |
            | Syntax Analyzer |
            |                 |
            +-----------------+
            |                 |
            | Semantic Analyze|
            |                 |
            +-----------------+
            |                 |
            | Pre Optimization|
            |                 |
            +-----------------+
            |                 |
            | Code generation |
            |                 |
            +-----------------+
            |                 |
            | Post Optimize   |
            |                 |
            +=================+
                     |
                     |--->  Assembly code (comd: cc -S <file.name> )
                     |
                     V
            +=================+
            |                 |
            |   Assembler     |
            |                 |
            +=================+
                     |
                     |--->  Object file (.obj) (comd: cc -c <file.name>)
                     |
                     V
            +=================+
            |     Linker      |
            |      and        |
            |     loader      |
            +=================+
                     |
                     |--->  Executable (.Exe/a.out) (com:cc <file.name> ) 
                     |
                     V
            Executable file(a.out)

C preprocessor :-

C preprocessing is the first step in the compilation. It handles:

  1. #define statements.
  2. #include statements.
  3. Conditional statements.
  4. Macros

The purpose of the unit is to convert the C source file into Pure C code file.

C compilation :

There are Six steps in the unit :

1) Lexical Analyzer:

It combines characters in the source file, to form a "TOKEN". A token is a set of characters that does not have 'space', 'tab' and 'new line'. Therefore this unit of compilation is also called "TOKENIZER". It also removes the comments, generates symbol table and relocation table entries.

2) Syntactic Analyzer:

This unit check for the syntax in the code. For ex:

{
    int a;
    int b;
    int c;
    int d;

    d = a + b - c *   ;
}

The above code will generate the parse error because the equation is not balanced. This unit checks this internally by generating the parser tree as follows:

                            =
                          /   \
                        d       -
                              /     \
                            +           *
                          /   \       /   \
                        a       b   c       ?

Therefore this unit is also called PARSER.

3) Semantic Analyzer:

This unit checks the meaning in the statements. For ex:

{
    int i;
    int *p;

    p = i;
    -----
    -----
    -----
}

The above code generates the error "Assignment of incompatible type".

4) Pre-Optimization:

This unit is independent of the CPU, i.e., there are two types of optimization

  1. Preoptimization (CPU independent)
  2. Postoptimization (CPU dependent)

This unit optimizes the code in following forms:

  • I) Dead code elimination
  • II) Sub code elimination
  • III) Loop optimization

I) Dead code elimination:

For ex:

{
    int a = 10;
    if ( a > 5 ) {
        /*
        ...
        */
    } else {
       /*
       ...
       */
    }
}

Here, the compiler knows the value of 'a' at compile time, therefore it also knows that the if condition is always true. Hence it eliminates the else part in the code.

II) Sub code elimination:

For ex:

{
    int a, b, c;
    int x, y;

    /*
    ...
    */

    x = a + b;
    y = a + b + c;

    /*
    ...
    */
}

can be optimized as follows:

{
    int a, b, c;
    int x, y;

    /*
     ...
    */

    x = a + b;
    y = x + c;      // a + b is replaced by x

    /*
     ...
    */
}

III) Loop optimization:

For ex:

{
    int a;
    for (i = 0; i < 1000; i++ ) {

    /*
     ...
    */

    a = 10;

    /*
     ...
    */
    }
}

In the above code, if 'a' is local and not used in the loop, then it can be optimized as follows:

{
    int a;
    a = 10;
    for (i = 0; i < 1000; i++ ) {
        /*
        ...
        */
    }
}

5) Code generation:

Here, the compiler generates the assembly code so that the more frequently used variables are stored in the registers.

6) Post-Optimization:

Here the optimization is CPU dependent. Suppose if there are more than one jumps in the code then they are converted to one as:

            -----
        jmp:<addr1>
<addr1> jmp:<addr2>
            -----
            -----

The control jumps to the directly.

Then the last phase is Linking (which creates executable or library). When the executable is run, the libraries it requires are Loaded.

Examples related to compiler-construction

fatal error C1010 - "stdafx.h" in Visual Studio how can this be corrected? Compilation error: stray ‘\302’ in program etc What is difference between sjlj vs dwarf vs seh? What is the difference between a token and a lexeme? How to compile makefile using MinGW? C++ variable has initializer but incomplete type? It is more efficient to use if-return-return or if-else-return? Could not load file or assembly ... The parameter is incorrect How do I compile the asm generated by GCC? Visual Studio: LINK : fatal error LNK1181: cannot open input file

Examples related to linker

C compile : collect2: error: ld returned 1 exit status How to fix symbol lookup error: undefined symbol errors in a cluster environment gcc: undefined reference to libpthread.so.0: error adding symbols: DSO missing from command line Compilation fails with "relocation R_X86_64_32 against `.rodata.str1.8' can not be used when making a shared object" Multiple definition of ... linker error C Linking Error: undefined reference to 'main' ld cannot find -l<library> ldconfig error: is not a symbolic link Why am I getting "undefined reference to sqrt" error even though I include math.h header?

Examples related to terminology

The differences between initialize, define, declare a variable What is the difference between a web API and a web service? What does "opt" mean (as in the "opt" directory)? Is it an abbreviation? What's the name for hyphen-separated case? What is Bit Masking? What is ADT? (Abstract Data Type) What exactly are iterator, iterable, and iteration? What is a web service endpoint? What is the difference between Cloud, Grid and Cluster? How to explain callbacks in plain english? How are they different from calling one function from another function?

Examples related to loader

How to create multiple output paths in Webpack config Where to find Application Loader app in Mac? What is compiler, linker, loader? How to show Page Loading div until the page has finished loading?