[gcc] Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

Library functions like "pow" are usually carefully crafted to yield the minimum possible error (in generic case). This is usually achieved approximating functions with splines (according to Pascal's comment the most common implementation seems to be using Remez algorithm)

fundamentally the following operation:

pow(x,y);

has a inherent error of approximately the same magnitude as the error in any single multiplication or division.

While the following operation:

float a=someValue;
float b=a*a*a*a*a*a;

has a inherent error that is greater more than 5 times the error of a single multiplication or division (because you are combining 5 multiplications).

The compiler should be really carefull to the kind of optimization it is doing:

  1. if optimizing pow(a,6) to a*a*a*a*a*a it may improve performance, but drastically reduce the accuracy for floating point numbers.
  2. if optimizing a*a*a*a*a*a to pow(a,6) it may actually reduce the accuracy because "a" was some special value that allows multiplication without error (a power of 2 or some small integer number)
  3. if optimizing pow(a,6) to (a*a*a)*(a*a*a) or (a*a)*(a*a)*(a*a) there still can be a loss of accuracy compared to pow function.

In general you know that for arbitrary floating point values "pow" has better accuracy than any function you could eventually write, but in some special cases multiple multiplications may have better accuracy and performance, it is up to the developer choosing what is more appropriate, eventually commenting the code so that noone else would "optimize" that code.

The only thing that make sense (personal opinion, and apparently a choice in GCC wichout any particular optimization or compiler flag) to optimize should be replacing "pow(a,2)" with "a*a". That would be the only sane thing a compiler vendor should do.

Examples related to gcc

Can't compile C program on a Mac after upgrade to Mojave Compiling an application for use in highly radioactive environments Make Error 127 when running trying to compile code How to Install gcc 5.3 with yum on CentOS 7.2? How does one set up the Visual Studio Code compiler/debugger to GCC? How do I set up CLion to compile and run? CMake error at CMakeLists.txt:30 (project): No CMAKE_C_COMPILER could be found How to printf a 64-bit integer as hex? Differences between arm64 and aarch64 Fatal error: iostream: No such file or directory in compiling C program using GCC

Examples related to assembly

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? While, Do While, For loops in Assembly Language (emu8086) Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs How to run a program without an operating system? Difference between "move" and "li" in MIPS assembly language Carry Flag, Auxiliary Flag and Overflow Flag in Assembly How do AX, AH, AL map onto EAX? JNZ & CMP Assembly Instructions Difference between JE/JNE and JZ/JNZ The point of test %eax %eax

Examples related to floating-point

Convert list or numpy array of single element to float in python Convert float to string with precision & number of decimal digits specified? Float and double datatype in Java C convert floating point to int Convert String to Float in Swift How do I change data-type of pandas data frame to string with a defined format? How to check if a float value is a whole number Convert floats to ints in Pandas? Converting Float to Dollars and Cents Format / Suppress Scientific Notation from Python Pandas Aggregation Results

Examples related to compiler-optimization

How to compile Tensorflow with SSE4.2 and AVX instructions? Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs Swift Beta performance: sorting arrays Why are elementwise additions much faster in separate loops than in a combined loop? Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)? How to disable compiler optimizations in gcc? How to see which flags -march=native will activate? Why do we use volatile keyword? How to turn off gcc compiler optimization to enable buffer overflow

Examples related to fast-math

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?