What is more efficient Using pow to square or just multiply it with itself

Question

What of these two methods is in C more efficient  And how about   pow x 3    vs    x x x    etc

User · Answer

That s the wrong kind of question  The right question would be   Which one is easier to understand for human readers of my code    If speed matters  later   don t ask  but measure   And before that  measure whether optimizing this actually will make any noticeable difference   Until then  write the code so that it is easiest to read    Edit Just to make this clear  although it already should have been   Breakthrough speedups usually come from things like using better algorithms  improving locality of data  reducing the use of dynamic memory  pre-computing results  etc  They rarely ever come from micro-optimizing single function calls  and where they do  they do so in very few places  which would only be found by careful  and time-consuming  profiling  more often than never they can be sped up by doing very non-intuitive things  like inserting noop statements   and what s an optimization for one platform is sometimes a pessimization for another  which is why you need to measure  instead of asking  because we don t fully know have your environment     Let me underline this again  Even in the few applications where such things matter  they don t matter in most places they re used  and it is very unlikely that you will find the places where they matter by looking at the code  You really do need to identify the hot spots first  because otherwise optimizing code is just a waste of time    Even if a single operation  like computing the square of some value  takes up 10  of the application s execution time  which IME is quite rare   and even if optimizing it saves 50  of the time necessary for that operation  which IME is even much  much rarer   you still made the application take only 5  less time  Your users will need a stopwatch to even notice that   I guess in most cases anything under 20  speedup goes unnoticed for most users  And that is four such spots you need to find

User · Answer

x x or x x x will be faster than pow  since pow must deal with the general case  whereas x x is specific  Also  you can elide the function call and suchlike   However  if you find yourself micro-optimizing like this  you need to get a profiler and do some serious profiling  The overwhelming probability is that you would never notice any difference between the two

User · Answer

The most efficient way is to consider the exponential growth of the multiplications  Check this code for p q   template  lt typename T gt  T expt T p  unsigned q       T r  1      while  q    0            if  q   2    1          if q is odd             r    p              q--                    p    p          q    2            return r

User · Answer

I have been busy with a similar problem  and I m quite puzzled by the results  I was calculating x       for Newtonian gravitation  in an n-bodies situation  acceleration undergone from another body of mass M situated at a distance vector d    a   M G d  d           where d   is the dot  scalar  product of d by itself    and I thought calculating M G pow d2  -1 5  would be simpler than M G d2 sqrt d2   The trick is that it is true for small systems  but as  systems grow in size  M G d2 sqrt d2  becomes more efficient and I don t understand why the size of the system impacts this result  because repeating the operation on different data does not  It is as if there were possible optimizations as the system grow  but which are not possible with pow

User · Answer

If the exponent is constant and small  expand it out  minimizing the number of multiplications   For example  x 4 is not optimally x x x x  but y y where y x x  And x 5 is y y x where y x x  And so on   For constant integer exponents  just write out the optimized form already  with small exponents  this is a standard optimization that should be performed whether the code has been profiled or not  The optimized form will be quicker in so large a percentage of cases that it s basically always worth doing    If you use Visual C    std  pow float int  performs the optimization I allude to  whereby the sequence of operations is related to the bit pattern of the exponent  I make no guarantee that the compiler will unroll the loop for you  though  so it s still worth doing it by hand     edit  BTW pow has a  un surprising tendency to crop up on the profiler results  If you don t absolutely need it  i e   the exponent is large or not a constant   and you re at all concerned about performance  then best to write out the optimal code and wait for the profiler to tell you it s  surprisingly  wasting time before thinking further   The alternative is to call pow and have the profiler tell you it s  unsurprisingly  wasting time -- you re cutting out this step by doing it intelligently

User · Answer

I tested the performance difference between x x     vs pow x i  for small i using this code    include  lt cstdlib gt   include  lt cmath gt   include  lt boost date time posix time posix time hpp gt   inline boost  posix time  ptime now         return boost  posix time  microsec clock  local time        define TEST num  expression    double test  num double b  long loops            double x   0 0          boost  posix time  ptime startTime   now          for  long i 0  i lt loops    i                    x    expression            x    expression            x    expression            x    expression            x    expression            x    expression            x    expression            x    expression            x    expression            x    expression                boost  posix time  time duration elapsed   now   - startTime          std  cout  lt  lt  elapsed  lt  lt               return x       TEST 1  b  TEST 2  b b  TEST 3  b b b  TEST 4  b b b b  TEST 5  b b b b b   template  lt int exponent gt  double testpow double base  long loops        double x   0 0       boost  posix time  ptime startTime   now        for  long i 0  i lt loops    i                x    std  pow base  exponent           x    std  pow base  exponent           x    std  pow base  exponent           x    std  pow base  exponent           x    std  pow base  exponent           x    std  pow base  exponent           x    std  pow base  exponent           x    std  pow base  exponent           x    std  pow base  exponent           x    std  pow base  exponent             boost  posix time  time duration elapsed   now   - startTime       std  cout  lt  lt  elapsed  lt  lt            return x     int main         using std  cout      long loops   100000000l      double x   0 0      cout  lt  lt   1        x    testpow lt 1 gt  rand    loops       x    test1 rand    loops        cout  lt  lt    n2        x    testpow lt 2 gt  rand    loops       x    test2 rand    loops        cout  lt  lt    n3        x    testpow lt 3 gt  rand    loops       x    test3 rand    loops        cout  lt  lt    n4        x    testpow lt 4 gt  rand    loops       x    test4 rand    loops        cout  lt  lt    n5        x    testpow lt 5 gt  rand    loops       x    test5 rand    loops       cout  lt  lt    n   lt  lt  x  lt  lt    n       Results are   1 00 00 01 126008 00 00 01 128338  2 00 00 01 125832 00 00 01 127227  3 00 00 01 125563 00 00 01 126590  4 00 00 01 126289 00 00 01 126086  5 00 00 01 126570 00 00 01 125930  2 45829e 54   Note that I accumulate the result of every pow calculation to make sure the compiler doesn t optimize it away   If I use the std  pow double  double  version  and loops   1000000l  I get   1 00 00 00 011339 00 00 00 011262  2 00 00 00 011259 00 00 00 011254  3 00 00 00 975658 00 00 00 011254  4 00 00 00 976427 00 00 00 011254  5 00 00 00 973029 00 00 00 011254  2 45829e 52   This is on an Intel Core Duo running Ubuntu 9 10 64bit  Compiled using gcc 4 4 1 with -o2 optimization   So in C  yes x x x will be faster than pow x  3   because there is no pow double  int  overload  In C    it will be the roughly same   Assuming the methodology in my testing is correct      This is in response to the comment made by An Markm   Even if a using namespace std directive was issued  if the second parameter to pow is an int  then the std  pow double  int  overload from  lt cmath gt  will be called instead of   pow double  double  from  lt math h gt    This test code confirms that behavior    include  lt iostream gt   namespace foo        double bar double x  int i                std  cout  lt  lt   foo  bar n           return x i             double bar double x  double y        std  cout  lt  lt     bar n       return x y     using namespace foo   int main         double a   bar 1 2  3      Prints  foo  bar      std  cout  lt  lt  a  lt  lt    n       return 0

User · Answer

I was also wondering about the performance issue  and was hoping this would be optimised out by the compiler  based on the answer from  EmileCormier  However  I was worried that the test code he showed would still allow the compiler to optimise away the std  pow   call  since the same values were used in the call every time  which would allow the compiler to store the results and re-use it in the loop - this would explain the almost identical run-times for all cases  So I had a look into it too    Here s the code I used  test pow cpp     include  lt iostream gt                                                                                                                                                                                                                          include  lt cmath gt   include  lt chrono gt   class Timer     public      explicit Timer      from  std  chrono  high resolution clock  now             void start            from   std  chrono  high resolution clock  now               double elapsed   const         return std  chrono  duration cast lt std  chrono  milliseconds gt  std  chrono  high resolution clock  now   - from  count     1 0e-6           private      std  chrono  high resolution clock  time point from      int main  int argc  char  argv        double total    Timer timer       total   0 0    timer start      for  double i   0 0  i  lt  1 0  i    1e-8      total    std  pow  i 2     std  cout  lt  lt   std  pow i 2      lt  lt  timer elapsed    lt  lt   s  result      lt  lt  total  lt  lt     n      total   0 0    timer start      for  double i   0 0  i  lt  1 0  i    1e-8      total    i i    std  cout  lt  lt   i i     lt  lt  timer elapsed    lt  lt   s  result      lt  lt  total  lt  lt     n      std  cout  lt  lt    n      total   0 0    timer start      for  double i   0 0  i  lt  1 0  i    1e-8      total    std  pow  i 3     std  cout  lt  lt   std  pow i 3      lt  lt  timer elapsed    lt  lt   s  result      lt  lt  total  lt  lt     n      total   0 0    timer start      for  double i   0 0  i  lt  1 0  i    1e-8      total    i i i    std  cout  lt  lt   i i i     lt  lt  timer elapsed    lt  lt   s  result      lt  lt  total  lt  lt     n       return 0      This was compiled using   g   -std c  11  -O2  test pow cpp -o test pow   Basically  the difference is the argument to std  pow   is the loop counter  As I feared  the difference in performance is pronounced  Without the -O2 flag  the results on my system  Arch Linux 64-bit  g   4 9 1  Intel i7-4930  were   std  pow i 2   0 001105s  result   3 33333e 07  i i  0 000352s  result   3 33333e 07   std  pow i 3   0 006034s  result   2 5e 07  i i i  0 000328s  result   2 5e 07    With optimisation  the results were equally striking   std  pow i 2   0 000155s  result   3 33333e 07  i i  0 000106s  result   3 33333e 07   std  pow i 3   0 006066s  result   2 5e 07  i i i  9 7e-05s  result   2 5e 07    So it looks like the compiler does at least try to optimise the std  pow x 2  case  but not the std  pow x 3  case  it takes  40 times longer than the std  pow x 2  case   In all cases  manual expansion performed better - but particularly for the power 3 case  60 times quicker   This is definitely worth bearing in mind if running std  pow   with integer powers greater than 2 in a tight loop

[c++] What is more efficient? Using pow to square or just multiply it with itself?

Examples related to c++

Examples related to c

Examples related to optimization