I have heard from various sources (though mostly from a colleague of mine), that compiling with an optimisation level of -O3
in g++ is somehow 'dangerous', and should be avoided in general unless proven to be necessary.
Is this true, and if so, why? Should I just be sticking to -O2
?
This question is related to
c++
optimization
g++
compiler-flags
-O3 option turns on more expensive optimizations, such as function inlining, in addition to all the optimizations of the lower levels ‘-O2’ and ‘-O1’. The ‘-O3’ optimization level may increase the speed of the resulting executable, but can also increase its size. Under some circumstances where these optimizations are not favorable, this option might actually make a program slower.
In my somewhat checkered experience, applying -O3
to an entire program almost always makes it slower (relative to -O2
), because it turns on aggressive loop unrolling and inlining that make the program no longer fit in the instruction cache. For larger programs, this can also be true for -O2
relative to -Os
!
The intended use pattern for -O3
is, after profiling your program, you manually apply it to a small handful of files containing critical inner loops that actually benefit from these aggressive space-for-speed tradeoffs. Newer versions of GCC have a profile-guided optimization mode that can (IIUC) selectively apply the -O3
optimizations to hot functions -- effectively automating this process.
Recently I experienced a problem using optimization with g++
. The problem was related to a PCI card, where the registers (for command and data) were repreented by a memory address. My driver mapped the physical address to a pointer within the application and gave it to the called process, which worked with it like this:
unsigned int * pciMemory;
askDriverForMapping( & pciMemory );
...
pciMemory[ 0 ] = someCommandIdx;
pciMemory[ 0 ] = someCommandLength;
for ( int i = 0; i < sizeof( someCommand ); i++ )
pciMemory[ 0 ] = someCommand[ i ];
The card didn't act as expected. When I saw the assembly I understood that the compiler only wrote someCommand[ the last ]
into pciMemory
, omitting all preceding writes.
In conclusion: be accurate and attentive with optimization.
Yes, O3 is buggier. I'm a compiler developer and I've identified clear and obvious gcc bugs caused by O3 generating buggy SIMD assembly instructions when building my own software. From what I've seen, most production software ships with O2 which means O3 will get less attention wrt testing and bug fixes.
Think of it this way: O3 adds more transformations on top of O2, which adds more transformations on top of O1. Statistically speaking, more transformations means more bugs. That's true for any compiler.
Source: Stackoverflow.com