[compiler-construction] What is compiler, linker, loader?

I wanted to know in depth meaning and working of compiler, linker and loader. With reference to any language preferably c++.

This question is related to compiler-construction linker terminology loader

The answer is


compiler changes checks your source code for errors and changes it into object code.this is the code that operating system runs.

You often don't write a whole program in single file so linker links all your object code files.

your program wont get executed unless it is in main memory


Compiler :it is a system software which correct the error of programs,object file ,messages etc

Linker:it is a system software which combines One or more objectfiles and possible some library code into either some exicutable some library or a list of error

Loader: A program which loads the executable file to the primary memory of the machine


  • Compiler: A language translator that converts a complete program into machine language to produce a program that the computer can process in its entirety.
  • Linker: Utility program which takes one or more compiled object files and combines them into an executable file or another object file.
  • Loader: loads the executable code into memory ,creates the program and data stack , initializes the registers and starts the code running.

Compiler: It is a program which translates a high level language program into a machine language program. A compiler is more intelligent than an assembler. It checks all kinds of limits, ranges, errors etc. But its program run time is more and occupies a larger part of the memory. It has slow speed. Because a compiler goes through the entire program and then translates the entire program into machine codes. If a compiler runs on a computer and produces the machine codes for the same computer then it is known as a self compiler or resident compiler. On the other hand, if a compiler runs on a computer and produces the machine codes for other computer then it is known as a cross compiler.

Linker: In high level languages, some built in header files or libraries are stored. These libraries are predefined and these contain basic functions which are essential for executing the program. These functions are linked to the libraries by a program called Linker. If linker does not find a library of a function then it informs to compiler and then compiler generates an error. The compiler automatically invokes the linker as the last step in compiling a program. Not built in libraries, it also links the user defined functions to the user defined libraries. Usually a longer program is divided into smaller subprograms called modules. And these modules must be combined to execute the program. The process of combining the modules is done by the linker.

Loader: Loader is a program that loads machine codes of a program into the system memory. In Computing, a loader is the part of an Operating System that is responsible for loading programs. It is one of the essential stages in the process of starting a program. Because it places programs into memory and prepares them for execution. Loading a program involves reading the contents of executable file into memory. Once loading is complete, the operating system starts the program by passing control to the loaded program code. All operating systems that support program loading have loaders. In many operating systems the loader is permanently resident in memory.


  • Compiler : Which convert Human understandable format into machine understandable format
  • Linker : Which convert machine understandable format into Operating system understandable format
  • Loader : is entity which actually load and runs the program into RAM

Linker & Interpreter are mutually exclusive Interpreter getting code line by line and execute line by line.


Compiler It converts the source code into the object code.

Linker It combines the multiple object files into a single executable program file.

Loader It loads the executable file into main memory.


*

explained with respect to, linux/unix based systems, though it's a basic concept for all other computing systems.

*

Linkers and Loaders from LinuxJournal explains this concept with clarity. It also explains how the classic name a.out came. (assembler output)

A quick summary,

c program --> [compiler] --> objectFile --> [linker] --> executable file (say, a.out)

we got the executable, now give this file to your friend or to your customer who is in need of this software :)

when they run this software, say by typing it in command line ./a.out

execute in command line ./a.out --> [Loader] --> [execve] --> program is loaded in memory

Once the program is loaded into the memory, control is transferred to this program by making the PC (program counter) pointing to the first instruction of a.out


Wikipedia ought to have a good answer, here's my thoughts:

  • Compiler: reads something.c source, writes something.o object.
  • Linker: joins several *.o files into an executable program.
  • Loader: code that loads an executable into memory and starts it running.

=====> COMPILATION PROCESS <======

                     |
                     |---->  Input is Source file(.c)
                     |
                     V
            +=================+
            |                 |
            | C Preprocessor  |
            |                 |
            +=================+
                     |
                     | ---> Pure C file ( comd:cc -E <file.name> )
                     |
                     V
            +=================+
            |                 |
            | Lexical Analyzer|
            |                 |
            +-----------------+
            |                 |
            | Syntax Analyzer |
            |                 |
            +-----------------+
            |                 |
            | Semantic Analyze|
            |                 |
            +-----------------+
            |                 |
            | Pre Optimization|
            |                 |
            +-----------------+
            |                 |
            | Code generation |
            |                 |
            +-----------------+
            |                 |
            | Post Optimize   |
            |                 |
            +=================+
                     |
                     |--->  Assembly code (comd: cc -S <file.name> )
                     |
                     V
            +=================+
            |                 |
            |   Assembler     |
            |                 |
            +=================+
                     |
                     |--->  Object file (.obj) (comd: cc -c <file.name>)
                     |
                     V
            +=================+
            |     Linker      |
            |      and        |
            |     loader      |
            +=================+
                     |
                     |--->  Executable (.Exe/a.out) (com:cc <file.name> ) 
                     |
                     V
            Executable file(a.out)

C preprocessor :-

C preprocessing is the first step in the compilation. It handles:

  1. #define statements.
  2. #include statements.
  3. Conditional statements.
  4. Macros

The purpose of the unit is to convert the C source file into Pure C code file.

C compilation :

There are Six steps in the unit :

1) Lexical Analyzer:

It combines characters in the source file, to form a "TOKEN". A token is a set of characters that does not have 'space', 'tab' and 'new line'. Therefore this unit of compilation is also called "TOKENIZER". It also removes the comments, generates symbol table and relocation table entries.

2) Syntactic Analyzer:

This unit check for the syntax in the code. For ex:

{
    int a;
    int b;
    int c;
    int d;

    d = a + b - c *   ;
}

The above code will generate the parse error because the equation is not balanced. This unit checks this internally by generating the parser tree as follows:

                            =
                          /   \
                        d       -
                              /     \
                            +           *
                          /   \       /   \
                        a       b   c       ?

Therefore this unit is also called PARSER.

3) Semantic Analyzer:

This unit checks the meaning in the statements. For ex:

{
    int i;
    int *p;

    p = i;
    -----
    -----
    -----
}

The above code generates the error "Assignment of incompatible type".

4) Pre-Optimization:

This unit is independent of the CPU, i.e., there are two types of optimization

  1. Preoptimization (CPU independent)
  2. Postoptimization (CPU dependent)

This unit optimizes the code in following forms:

  • I) Dead code elimination
  • II) Sub code elimination
  • III) Loop optimization

I) Dead code elimination:

For ex:

{
    int a = 10;
    if ( a > 5 ) {
        /*
        ...
        */
    } else {
       /*
       ...
       */
    }
}

Here, the compiler knows the value of 'a' at compile time, therefore it also knows that the if condition is always true. Hence it eliminates the else part in the code.

II) Sub code elimination:

For ex:

{
    int a, b, c;
    int x, y;

    /*
    ...
    */

    x = a + b;
    y = a + b + c;

    /*
    ...
    */
}

can be optimized as follows:

{
    int a, b, c;
    int x, y;

    /*
     ...
    */

    x = a + b;
    y = x + c;      // a + b is replaced by x

    /*
     ...
    */
}

III) Loop optimization:

For ex:

{
    int a;
    for (i = 0; i < 1000; i++ ) {

    /*
     ...
    */

    a = 10;

    /*
     ...
    */
    }
}

In the above code, if 'a' is local and not used in the loop, then it can be optimized as follows:

{
    int a;
    a = 10;
    for (i = 0; i < 1000; i++ ) {
        /*
        ...
        */
    }
}

5) Code generation:

Here, the compiler generates the assembly code so that the more frequently used variables are stored in the registers.

6) Post-Optimization:

Here the optimization is CPU dependent. Suppose if there are more than one jumps in the code then they are converted to one as:

            -----
        jmp:<addr1>
<addr1> jmp:<addr2>
            -----
            -----

The control jumps to the directly.

Then the last phase is Linking (which creates executable or library). When the executable is run, the libraries it requires are Loaded.


  • A compiler reads, analyses and translates code into either an object file or a list of error messages.
  • A linker combines one or more object files and possible some library code into either some executable, some library or a list of error messages.
  • A loader reads the executable code into memory, does some address translation and tries to run the program resulting in a running program or an error message (or both).

ASCII representation:

[Source Code] ---> Compiler ---> [Object code] --*
                                                 |
[Source Code] ---> Compiler ---> [Object code] --*--> Linker --> [Executable] ---> Loader 
                                                 |                                    |
[Source Code] ---> Compiler ---> [Object code] --*                                    |
                                                 |                                    |
                                 [Library file]--*                                    V
                                                                       [Running Executable in Memory]

A compiler is a special program that processes statements written in a particular programming language and turns them into machine language or "code" that a computer's processor uses


Hope this helps you a little more.

First, go through this diagram:

(img source->internet)

source->internet

You make a piece of code and save the file (Source code), then

Preprocessing :- As the name suggests, it's not part of compilation. They instruct the compiler to do required pre-processing before the actual compilation. You can call this phase Text Substitution or interpreting special preprocessor directives denoted by #.

Compilation :- Compilation is a process in which a program written in one language get translated into another targeted language. If there is some errors, the compiler will detect them and report it.

Assemble :- Assemble code gets translated into machine code. You can call assembler a special type of complier.

Linking:- If these piece of code needs some other source file to be linked, linker link them to make it a executable file.

There are many process that happens after it. Yes, you guessed it right here comes the role of the loader:

Loader:- It loads the executable code into memory; program and data stack are created, register gets initialized.

Little Extra info :- http://www.geeksforgeeks.org/memory-layout-of-c-program/ , you can see the memory layout over there.


A Compiler translates lines of code from the programming language into machine language.

A Linker creates a link between two programs.

A Loader loads the program into memory in the main database, program, etc.


Compiler:

It will read source file which may be of type .c or .cpp etc and translates that to .o file called as object file.

Linker:

It combines the several .o files which may be generated for multiple source files into an executable file (ELF format in GCC). There are two type of linking:

  • static linking
  • dynamic linking

Loader:

A program which loads the executable file to the primary memory of the machine.


For an in-detail study about the these three stages of program execution in Linux, please read this.


Examples related to compiler-construction

fatal error C1010 - "stdafx.h" in Visual Studio how can this be corrected? Compilation error: stray ‘\302’ in program etc What is difference between sjlj vs dwarf vs seh? What is the difference between a token and a lexeme? How to compile makefile using MinGW? C++ variable has initializer but incomplete type? It is more efficient to use if-return-return or if-else-return? Could not load file or assembly ... The parameter is incorrect How do I compile the asm generated by GCC? Visual Studio: LINK : fatal error LNK1181: cannot open input file

Examples related to linker

C compile : collect2: error: ld returned 1 exit status How to fix symbol lookup error: undefined symbol errors in a cluster environment gcc: undefined reference to libpthread.so.0: error adding symbols: DSO missing from command line Compilation fails with "relocation R_X86_64_32 against `.rodata.str1.8' can not be used when making a shared object" Multiple definition of ... linker error C Linking Error: undefined reference to 'main' ld cannot find -l<library> ldconfig error: is not a symbolic link Why am I getting "undefined reference to sqrt" error even though I include math.h header?

Examples related to terminology

The differences between initialize, define, declare a variable What is the difference between a web API and a web service? What does "opt" mean (as in the "opt" directory)? Is it an abbreviation? What's the name for hyphen-separated case? What is Bit Masking? What is ADT? (Abstract Data Type) What exactly are iterator, iterable, and iteration? What is a web service endpoint? What is the difference between Cloud, Grid and Cluster? How to explain callbacks in plain english? How are they different from calling one function from another function?

Examples related to loader

How to create multiple output paths in Webpack config Where to find Application Loader app in Mac? What is compiler, linker, loader? How to show Page Loading div until the page has finished loading?