How does the compilation linking process work

Question

How does the compilation and linking process work     Note  This is meant to be an entry to  Stack Overflow s C   FAQ  https   stackoverflow com questions tagged c  -faq   If you want to critique the idea of providing an FAQ in this form  then  the posting on meta that started all this  https   meta stackexchange com questions 68647 setting-up-a-faq-for-the-c-tag  would be the place to do that  Answers to that question are monitored in the  C   chatroom  https   chat stackoverflow com rooms 10 c-lounge   where the FAQ idea started out in the first place  so your answer is very likely to get read by those who came up with the idea

User · Answer

This topic is discussed at CProgramming com  https   www cprogramming com compilingandlinking html Here is what the author there wrote   Compiling isn t quite the same as creating an executable file  Instead  creating an executable is a multistage process divided into two components  compilation and linking  In reality  even if a program  quot compiles fine quot  it might not actually work because of errors during the linking phase  The total process of going from source code files to an executable might better be referred to as a build  Compilation Compilation refers to the processing of source code files   c   cc  or  cpp  and the creation of an  object  file  This step doesn t create anything the user can actually run  Instead  the compiler merely produces the machine language instructions that correspond to the source code file that was compiled  For instance  if you compile  but don t link  three separate files  you will have three object files created as output  each with the name  o or  obj  the extension will depend on your compiler   Each of these files contains a translation of your source code file into a machine language file -- but you can t run them yet  You need to turn them into executables your operating system can use  That s where the linker comes in  Linking Linking refers to the creation of a single executable file from multiple object files  In this step  it is common that the linker will complain about undefined functions  commonly  main itself   During compilation  if the compiler could not find the definition for a particular function  it would just assume that the function was defined in another file  If this isn t the case  there s no way the compiler would know -- it doesn t look at the contents of more than one file at a time  The linker  on the other hand  may look at multiple files and try to find references for the functions that weren t mentioned  You might ask why there are separate compilation and linking steps  First  it s probably easier to implement things that way  The compiler does its thing  and the linker does its thing -- by keeping the functions separate  the complexity of the program is reduced  Another  more obvious  advantage is that this allows the creation of large programs without having to redo the compilation step every time a file is changed  Instead  using so called  quot conditional compilation quot   it is necessary to compile only those source files that have changed  for the rest  the object files are sufficient input for the linker  Finally  this makes it simple to implement libraries of pre-compiled code  just create object files and link them just like any other object file   The fact that each file is compiled separately from information contained in other files  incidentally  is called the  quot separate compilation model quot    To get the full benefits of condition compilation  it s probably easier to get a program to help you than to try and remember which files you ve changed since you last compiled   You could  of course  just recompile every file that has a timestamp greater than the timestamp of the corresponding object file   If you re working with an integrated development environment  IDE  it may already take care of this for you  If you re using command line tools  there s a nifty utility called make that comes with most  nix distributions  Along with conditional compilation  it has several other nice features for programming  such as allowing different compilations of your program -- for instance  if you have a version producing verbose output for debugging  Knowing the difference between the compilation phase and the link phase can make it easier to hunt for bugs  Compiler errors are usually syntactic in nature -- a missing semicolon  an extra parenthesis  Linking errors usually have to do with missing or multiple definitions  If you get an error that a function or variable is defined multiple times from the linker  that s a good indication that the error is that two of your source code files have the same function or variable

User · Answer

On the standard front    a translation unit is the combination of a source files  included headers and source files less any source lines skipped by conditional inclusion preprocessor directive  the standard defines 9 phases in the translation   The first four correspond to preprocessing  the next three are the compilation  the next one is the instantiation of templates  producing instantiation units  and the last one is the linking    In practice the eighth phase  the instantiation of templates  is often done during the compilation process but some compilers delay it to the linking phase and some spread it in the two

User · Answer

GCC compiles a C C   program into executable in 4 steps    For example  gcc -o hello hello c is carried out as follows   1  Pre-processing  Preprocessing via the GNU C Preprocessor  cpp exe   which includes     the headers   include  and expands the macros   define     cpp hello c  gt  hello i    The resultant intermediate file  hello i  contains the expanded source code   2  Compilation  The compiler compiles the pre-processed source code into assembly code for a specific processor    gcc -S hello i    The -S option specifies to produce assembly code  instead of object code  The resultant assembly file is  hello s    3  Assembly  The assembler  as exe  converts the assembly code into machine code in the object file  hello o     as -o hello o hello s    4  Linker  Finally  the linker  ld exe  links the object code with the library code to produce an executable file  hello          ld -o hello hello o    libraries

User · Answer

The compilation of a C   program involves three steps    Preprocessing  the preprocessor takes a C   source code file and deals with the  includes   defines and other preprocessor directives  The output of this step is a  pure  C   file without pre-processor directives  Compilation  the compiler takes the pre-processor s output and produces an object file from it  Linking  the linker takes the object files produced by the compiler and produces either a library or an executable file    Preprocessing  The preprocessor handles the preprocessor directives  like  include and  define  It is agnostic of the syntax of C    which is why it must be used with care   It works on one C   source file at a time by replacing  include directives with the content of the respective files  which is usually just declarations   doing replacement of macros   define   and selecting different portions of text depending of  if   ifdef and  ifndef directives   The preprocessor works on a stream of preprocessing tokens  Macro substitution is defined as replacing tokens with other tokens  the operator    enables merging two tokens when it makes sense    After all this  the preprocessor produces a single output that is a stream of tokens resulting from the transformations described above  It also adds some special markers that tell the compiler where each line came from so that it can use those to produce sensible error messages   Some errors can be produced at this stage with clever use of the  if and  error directives   Compilation  The compilation step is performed on each output of the preprocessor  The compiler parses the pure C   source code  now without any preprocessor directives  and converts it into assembly code  Then invokes underlying back-end assembler in toolchain  that assembles that code into machine code producing actual binary file in some format ELF  COFF  a out        This object file contains the compiled code  in binary form  of the symbols defined in the input  Symbols in object files are referred to by name   Object files can refer to symbols that are not defined  This is the case when you use a declaration  and don t provide a definition for it  The compiler doesn t mind this  and will happily produce the object file as long as the source code is well-formed   Compilers usually let you stop compilation at this point  This is very useful because with it you can compile each source code file separately  The advantage this provides is that you don t need to recompile everything if you only change a single file   The produced object files can be put in special archives called static libraries  for easier reusing later on   It s at this stage that  regular  compiler errors  like syntax errors or failed overload resolution errors  are reported   Linking  The linker is what produces the final compilation output from the object files the compiler produced  This output can be either a shared  or dynamic  library  and while the name is similar  they haven t got much in common with static libraries mentioned earlier  or an executable   It links all the object files by replacing the references to undefined symbols with the correct addresses  Each of these symbols can be defined in other object files or in libraries  If they are defined in libraries other than the standard library  you need to tell the linker about them   At this stage the most common errors are missing definitions or duplicate definitions  The former means that either the definitions don t exist  i e  they are not written   or that the object files or libraries where they reside were not given to the linker  The latter is obvious  the same symbol was defined in two different object files or libraries

User · Answer

The skinny is that a CPU loads data from memory addresses  stores data to memory addresses  and execute instructions sequentially out of memory addresses  with some conditional jumps in the sequence of instructions processed  Each of these three categories of instructions involves computing an address to a memory cell to be used in the machine instruction   Because machine instructions are of a variable length depending on the particular instruction involved  and because we string a variable length of them together as we build our machine code  there is a two step process involved in calculating and building any addresses   First we laying out the allocation of memory as best we can before we can know what exactly goes in each cell   We figure out the bytes  or words  or whatever that form the instructions and literals and any data   We just start allocating memory and building the values that will create the program as we go  and note down anyplace we need to go back and fix an address   In that place we put a dummy to just pad the location so we can continue to calculate memory size   For example our first machine code might take one cell   The next machine code might take 3 cells  involving one machine code cell and two address cells   Now our address pointer is 4   We know what goes in the machine cell  which is the op code  but we have to wait to calculate what goes in the address cells till we know where that data will be located  i e  what will be the machine address of that data   If there were just one source file a compiler could theoretically produce fully executable machine code without a linker   In a two pass process it could calculate all of the actual addresses to all of the data cells referenced by any machine load or store instructions   And it could calculate all of the absolute addresses referenced by any absolute jump instructions   This is how simpler compilers  like the one in Forth work  with no linker   A linker is something that allows blocks of code to be compiled separately   This can speed up the overall process of building code  and allows some flexibility with how the blocks are later used  in other words they can be relocated in memory  for example adding 1000 to every address to scoot the block up by 1000 address cells   So what the compiler outputs is rough machine code that is not yet fully built  but is laid out so we know the size of everything  in other words so we can start to calculate  where all of the absolute addresses will be located   the compiler also  outputs a list of symbols which are name address pairs   The symbols relate a memory offset in the machine code in the module with a name   The offset being the absolute distance to the memory location of the symbol in the module   That s where we get to the linker   The linker first slaps all of these blocks of machine code together end to end and notes down where each one starts   Then it calculates the addresses to be fixed by adding together the relative offset within a module and the absolute position of the module in the bigger layout   Obviously I ve oversimplified this so you can try to grasp it  and I have deliberately not used the jargon of object files  symbol tables  etc  which to me is part of the confusion

[c++] How does the compilation/linking process work?

Compilation

Linking

Examples related to c++

Examples related to compiler-construction

Examples related to linker

Examples related to c++-faq