[gcc] What exactly is LLVM?

I keep hearing about LLVM all the time. It's in Perl, then it's in Haskell, then someone uses it in some other language? What is it?

  • What exactly distinguishes it from GCC (perspectives = safety etc.)?

This question is related to gcc llvm

The answer is


The LLVM Compiler Infrastructure is particularly useful for performing optimizations and transformations on code. It also consists of a number of tools serving distinct usages. llvm-prof is a profiling tool that allows you to do profiling of execution in order to identify program hotspots. Opt is an optimization tool that offers various optimization passes (dead code elimination for instance).

Importantly LLVM provides you with the libraries, to write your own Passes. For instance if you require to add a range check on certain arguments that are passed into certain functions of a Program, writing a simple LLVM Pass would suffice.

For more information on writing your own Pass, check this http://llvm.org/docs/WritingAnLLVMPass.html


According to 'Getting Started With LLVM Core Libraries' book (c):

In fact, the name LLVM might refer to any of the following:

  • The LLVM project/infrastructure: This is an umbrella for several projects that, together, form a complete compiler: frontends, backends, optimizers, assemblers, linkers, libc++, compiler-rt, and a JIT engine. The word "LLVM" has this meaning, for example, in the following sentence: "LLVM is comprised of several projects".

  • An LLVM-based compiler: This is a compiler built partially or completely with the LLVM infrastructure. For example, a compiler might use LLVM for the frontend and backend but use GCC and GNU system libraries to perform the final link. LLVM has this meaning in the following sentence, for example: "I used LLVM to compile C programs to a MIPS platform".

  • LLVM libraries: This is the reusable code portion of the LLVM infrastructure. For example, LLVM has this meaning in the sentence: "My project uses LLVM to generate code through its Just-in-Time compilation framework".

  • LLVM core: The optimizations that happen at the intermediate language level and the backend algorithms form the LLVM core where the project started. LLVM has this meaning in the following sentence: "LLVM and Clang are two different projects".

  • The LLVM IR: This is the LLVM compiler intermediate representation. LLVM has this meaning when used in sentences such as "I built a frontend that translates my own language to LLVM".


LLVM is basically a library used to build compilers and/or language oriented software. The basic gist is although you have gcc which is probably the most common suite of compilers, it is not built to be re-usable ie. it is difficult to take components from gcc and use it to build your own application. LLVM addresses this issue well by building a set of "modular and reusable compiler and toolchain technologies" which anyone could use to build compilers and language oriented software.


A good summary of LLVM is this:

enter image description here

At the frontend you have Perl, and many other high level languages. At the backend, you have the natives code that run directly on the machine.

At the centre is your intermediate code representation. If every high level language can be represented in this LLVM IR format, then analysis tools based on this IR can be easily reused - that is the basic rationale.


LLVM (used to mean "Low Level Virtual Machine" but not anymore) is a compiler infrastructure, written in C++, which is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs written in arbitrary programming languages. Originally implemented for C/C++, the language-independent design (and the success) of LLVM has since spawned a wide variety of front-ends, including Objective C, Fortran, Ada, Haskell, Java bytecode, Python, Ruby, ActionScript, GLSL, and others.

Read this for more explanation
Also check out Unladen Swallow