[optimization] Clang vs GCC - which produces faster binaries?

I'm currently using GCC, but I discovered Clang recently and I'm pondering switching. There is one deciding factor though - quality (speed, memory footprint, reliability) of binaries it produces - if gcc -O3can produce a binary that runs 1% faster, or Clang binaries take up more memory or just fail due to compiler bugs, it's a deal-breaker.

Clang boasts better compile speeds and lower compile-time memory footprint than GCC, but I'm really interested in benchmarks/comparisons of resulting compiled software - could you point me to some or describe your experiences?

The answer is


Basically speaking, the answer is: it depends. There are many many benchmarks focusing on different kinds of application.

My benchmark on my app is: gcc > icc > clang.

There are rare IO, but many CPU float and data structure operations.

compile flags is -Wall -g -DNDEBUG -O3.

https://github.com/zhangyafeikimi/ml-pack/blob/master/gbdt/profile/benchmark


There is very little overall difference between GCC 4.8 and clang 3.3 in terms of speed of the resulting binary. In most cases code generated by both compilers performs similarly. Neither of these two compilers dominates the other one.

Benchmarks telling that there is a significant performance gap between GCC and clang are coincidental.

Program performance is affected by the choice of the compiler. If a developer or a group of developers is exclusively using GCC then the program can be expected to run slightly faster with GCC than with clang, and vice versa.

From developer viewpoint, a notable difference between GCC 4.8+ and clang 3.3 is that GCC has the -Og command line option. This option enables optimizations that do not interfere with debugging, so for example it is always possible to get accurate stack traces. The absence of this option in clang makes clang harder to use as an optimizing compiler for some developers.


The fact that Clang compiles code faster may not be as important as the speed of the resulting binary. However, here is a series of benchmarks.


The only way to determine this is to try it. FWIW I have seen some really good improvements using Apple's LLVM gcc 4.2 compared to the regular gcc 4.2 (for x86-64 code with quite a lot of SSE), but YMMV for different code bases. Assuming you're working with x86/x86-64 and that you really do care about the last few percent then you ought to try Intel's ICC too, as this can often beat gcc - you can get a 30 day evaluation license from intel.com and try it.


A peculiar difference I have noted on gcc 5.2.1 and clang 3.6.2 is that if you have a critical loop like:

for (;;) {
    if (!visited) {
        ....
    }
    node++;
    if (!*node) break;
  }

Then gcc will, when compiling with -O3 or -O2, speculatively unroll the loop eight times. Clang will not unroll it at all. Through trial and error I found that in my specific case with my program data, the right amount of unrolling is five so gcc overshot and clang undershot. However, overshooting was more detrimental to performance, so gcc performed much worse here.

I have no idea if the unrolling difference is a general trend or just something that was specific to my scenario.

A while back I wrote a few garbage collectors to teach myself more about performance optimization in C. And the results I got is in my mind enough to slightly favor clang. Especially since garbage collection is mostly about pointer chasing and copying memory.

The results are (numbers in seconds):

+---------------------+-----+-----+
|Type                 |GCC  |Clang|
+---------------------+-----+-----+
|Copying GC           |22.46|22.55|
|Copying GC, optimized|22.01|20.22|
|Mark & Sweep         | 8.72| 8.38|
|Ref Counting/Cycles  |15.14|14.49|
|Ref Counting/Plain   | 9.94| 9.32|
+---------------------+-----+-----+

This is all pure C code, and I make no claim about either compiler's performance when compiling C++ code.

On Ubuntu 15.10, x86.64, and an AMD Phenom(tm) II X6 1090T processor.


Phoronix did some benchmarks about this, but it is about a snapshot version of Clang/LLVM from a few months back. The results being that things were more-or-less a push; neither GCC nor Clang is definitively better in all cases.

Since you'd use the latest Clang, it's maybe a little less relevant. Then again, GCC 4.6 is slated to have some major optimizations for Core 2 and i7, apparently.

I figure Clang's faster compilation speed will be nicer for original developers, and then when you push the code out into the world, Linux distro/BSD/etc. end-users will use GCC for the faster binaries.


Examples related to optimization

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Measuring execution time of a function in C++ GROUP BY having MAX date How to efficiently remove duplicates from an array without using Set Storing JSON in database vs. having a new column for each key Read file As String How to write a large buffer into a binary file in C++, fast? Is optimisation level -O3 dangerous in g++? Why is processing a sorted array faster than processing an unsorted array? MySQL my.cnf performance tuning recommendations

Examples related to gcc

Can't compile C program on a Mac after upgrade to Mojave Compiling an application for use in highly radioactive environments Make Error 127 when running trying to compile code How to Install gcc 5.3 with yum on CentOS 7.2? How does one set up the Visual Studio Code compiler/debugger to GCC? How do I set up CLion to compile and run? CMake error at CMakeLists.txt:30 (project): No CMAKE_C_COMPILER could be found How to printf a 64-bit integer as hex? Differences between arm64 and aarch64 Fatal error: iostream: No such file or directory in compiling C program using GCC

Examples related to compiler-construction

fatal error C1010 - "stdafx.h" in Visual Studio how can this be corrected? Compilation error: stray ‘\302’ in program etc What is difference between sjlj vs dwarf vs seh? What is the difference between a token and a lexeme? How to compile makefile using MinGW? C++ variable has initializer but incomplete type? It is more efficient to use if-return-return or if-else-return? Could not load file or assembly ... The parameter is incorrect How do I compile the asm generated by GCC? Visual Studio: LINK : fatal error LNK1181: cannot open input file

Examples related to clang

Xcode - ld: library not found for -lPods Where is PATH_MAX defined in Linux? How to make clang compile to llvm IR How do I compile C++ with Clang? Clang vs GCC for my Linux Development project Switching between GCC and Clang/LLVM using CMake How do I print a list of "Build Settings" in Xcode project? Compiling problems: cannot find crt1.o Why is this program erroneously rejected by three C++ compilers? Clang vs GCC - which produces faster binaries?

Examples related to benchmarking

Why is 2 * (i * i) faster than 2 * i * i in Java? ab load testing Why is reading lines from stdin much slower in C++ than Python? Execution time of C program How to use clock() in C++ Clang vs GCC - which produces faster binaries? How to Calculate Execution Time of a Code Snippet in C++ Which is faster: multiple single INSERTs or one multiple-row INSERT? What do 'real', 'user' and 'sys' mean in the output of time(1)? How do I write a correct micro-benchmark in Java?