[c++] What is more efficient? Using pow to square or just multiply it with itself?

What of these two methods is in C more efficient? And how about:

pow(x,3)

vs.

x*x*x // etc?

This question is related to c++ c optimization

The answer is


The most efficient way is to consider the exponential growth of the multiplications. Check this code for p^q:

template <typename T>
T expt(T p, unsigned q){
    T r =1;
    while (q != 0) {
        if (q % 2 == 1) {    // if q is odd
            r *= p;
            q--;
        }
        p *= p;
        q /= 2;
    }
    return r;
}

I was also wondering about the performance issue, and was hoping this would be optimised out by the compiler, based on the answer from @EmileCormier. However, I was worried that the test code he showed would still allow the compiler to optimise away the std::pow() call, since the same values were used in the call every time, which would allow the compiler to store the results and re-use it in the loop - this would explain the almost identical run-times for all cases. So I had a look into it too.

Here's the code I used (test_pow.cpp):

#include <iostream>                                                                                                                                                                                                                       
#include <cmath>
#include <chrono>

class Timer {
  public:
    explicit Timer () : from (std::chrono::high_resolution_clock::now()) { }

    void start () {
      from = std::chrono::high_resolution_clock::now();
    }

    double elapsed() const {
      return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - from).count() * 1.0e-6;
    }

  private:
    std::chrono::high_resolution_clock::time_point from;
};

int main (int argc, char* argv[])
{
  double total;
  Timer timer;



  total = 0.0;
  timer.start();
  for (double i = 0.0; i < 1.0; i += 1e-8)
    total += std::pow (i,2);
  std::cout << "std::pow(i,2): " << timer.elapsed() << "s (result = " << total << ")\n";

  total = 0.0;
  timer.start();
  for (double i = 0.0; i < 1.0; i += 1e-8)
    total += i*i;
  std::cout << "i*i: " << timer.elapsed() << "s (result = " << total << ")\n";

  std::cout << "\n";

  total = 0.0;
  timer.start();
  for (double i = 0.0; i < 1.0; i += 1e-8)
    total += std::pow (i,3);
  std::cout << "std::pow(i,3): " << timer.elapsed() << "s (result = " << total << ")\n";

  total = 0.0;
  timer.start();
  for (double i = 0.0; i < 1.0; i += 1e-8)
    total += i*i*i;
  std::cout << "i*i*i: " << timer.elapsed() << "s (result = " << total << ")\n";


  return 0;
}

This was compiled using:

g++ -std=c++11 [-O2] test_pow.cpp -o test_pow

Basically, the difference is the argument to std::pow() is the loop counter. As I feared, the difference in performance is pronounced. Without the -O2 flag, the results on my system (Arch Linux 64-bit, g++ 4.9.1, Intel i7-4930) were:

std::pow(i,2): 0.001105s (result = 3.33333e+07)
i*i: 0.000352s (result = 3.33333e+07)

std::pow(i,3): 0.006034s (result = 2.5e+07)
i*i*i: 0.000328s (result = 2.5e+07)

With optimisation, the results were equally striking:

std::pow(i,2): 0.000155s (result = 3.33333e+07)
i*i: 0.000106s (result = 3.33333e+07)

std::pow(i,3): 0.006066s (result = 2.5e+07)
i*i*i: 9.7e-05s (result = 2.5e+07)

So it looks like the compiler does at least try to optimise the std::pow(x,2) case, but not the std::pow(x,3) case (it takes ~40 times longer than the std::pow(x,2) case). In all cases, manual expansion performed better - but particularly for the power 3 case (60 times quicker). This is definitely worth bearing in mind if running std::pow() with integer powers greater than 2 in a tight loop...


That's the wrong kind of question. The right question would be: "Which one is easier to understand for human readers of my code?"

If speed matters (later), don't ask, but measure. (And before that, measure whether optimizing this actually will make any noticeable difference.) Until then, write the code so that it is easiest to read.

Edit
Just to make this clear (although it already should have been): Breakthrough speedups usually come from things like using better algorithms, improving locality of data, reducing the use of dynamic memory, pre-computing results, etc. They rarely ever come from micro-optimizing single function calls, and where they do, they do so in very few places, which would only be found by careful (and time-consuming) profiling, more often than never they can be sped up by doing very non-intuitive things (like inserting noop statements), and what's an optimization for one platform is sometimes a pessimization for another (which is why you need to measure, instead of asking, because we don't fully know/have your environment).

Let me underline this again: Even in the few applications where such things matter, they don't matter in most places they're used, and it is very unlikely that you will find the places where they matter by looking at the code. You really do need to identify the hot spots first, because otherwise optimizing code is just a waste of time.

Even if a single operation (like computing the square of some value) takes up 10% of the application's execution time (which IME is quite rare), and even if optimizing it saves 50% of the time necessary for that operation (which IME is even much, much rarer), you still made the application take only 5% less time.
Your users will need a stopwatch to even notice that. (I guess in most cases anything under 20% speedup goes unnoticed for most users. And that is four such spots you need to find.)


x*x or x*x*x will be faster than pow, since pow must deal with the general case, whereas x*x is specific. Also, you can elide the function call and suchlike.

However, if you find yourself micro-optimizing like this, you need to get a profiler and do some serious profiling. The overwhelming probability is that you would never notice any difference between the two.


I have been busy with a similar problem, and I'm quite puzzled by the results. I was calculating x?³/² for Newtonian gravitation in an n-bodies situation (acceleration undergone from another body of mass M situated at a distance vector d) : a = M G d*(d²)?³/² (where d² is the dot (scalar) product of d by itself) , and I thought calculating M*G*pow(d2, -1.5) would be simpler than M*G/d2/sqrt(d2)

The trick is that it is true for small systems, but as systems grow in size, M*G/d2/sqrt(d2) becomes more efficient and I don't understand why the size of the system impacts this result, because repeating the operation on different data does not. It is as if there were possible optimizations as the system grow, but which are not possible with pow

enter image description here


If the exponent is constant and small, expand it out, minimizing the number of multiplications. (For example, x^4 is not optimally x*x*x*x, but y*y where y=x*x. And x^5 is y*y*x where y=x*x. And so on.) For constant integer exponents, just write out the optimized form already; with small exponents, this is a standard optimization that should be performed whether the code has been profiled or not. The optimized form will be quicker in so large a percentage of cases that it's basically always worth doing.

(If you use Visual C++, std::pow(float,int) performs the optimization I allude to, whereby the sequence of operations is related to the bit pattern of the exponent. I make no guarantee that the compiler will unroll the loop for you, though, so it's still worth doing it by hand.)

[edit] BTW pow has a (un)surprising tendency to crop up on the profiler results. If you don't absolutely need it (i.e., the exponent is large or not a constant), and you're at all concerned about performance, then best to write out the optimal code and wait for the profiler to tell you it's (surprisingly) wasting time before thinking further. (The alternative is to call pow and have the profiler tell you it's (unsurprisingly) wasting time -- you're cutting out this step by doing it intelligently.)


Examples related to c++

Method Call Chaining; returning a pointer vs a reference? How can I tell if an algorithm is efficient? Difference between opening a file in binary vs text How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure? Install Qt on Ubuntu #include errors detected in vscode Cannot open include file: 'stdio.h' - Visual Studio Community 2017 - C++ Error How to fix the error "Windows SDK version 8.1" was not found? Visual Studio 2017 errors on standard headers How do I check if a Key is pressed on C++

Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to optimization

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Measuring execution time of a function in C++ GROUP BY having MAX date How to efficiently remove duplicates from an array without using Set Storing JSON in database vs. having a new column for each key Read file As String How to write a large buffer into a binary file in C++, fast? Is optimisation level -O3 dangerous in g++? Why is processing a sorted array faster than processing an unsorted array? MySQL my.cnf performance tuning recommendations