Is multiplication and division using shift operators in C actually faster

Question

Multiplication and division can be achieved using bit operators  for example  i 2   i lt  lt 1 i 3    i lt  lt 1    i  i 10    i lt  lt 3     i lt  lt 1    and so on   Is it actually faster to use say  i lt  lt 3   i lt  lt 1  to multiply with 10 than using i 10 directly  Is there any sort of input that can t be multiplied or divided in this way

User · Answer

I agree with the marked answer by Drew Hall. The answer could use some additional notes though.

For the vast majority of software developers the processor and compiler are no longer relevant to the question. Most of us are far beyond the 8088 and MS-DOS. It is perhaps only relevant for those who are still developing for embedded processors...

At my software company Math (add/sub/mul/div) should be used for all mathematics. While Shift should be used when converting between data types eg. ushort to byte as n>>8 and not n/256.

User · Answer

As far as I know in some machines multiplication can need upto 16 to 32 machine cycle  So Yes  depending on the machine type  bitshift operators are faster than multiplication   division    However certain machine do have their math processor  which contains special instructions for multiplication division

User · Answer

I think in the one case that you want to multiply or divide by a power of two  you can t go wrong with using bitshift operators  even if the compiler converts them to a MUL DIV  because some processors microcode  really  a macro  them anyway  so for those cases you will achieve an improvement  especially if the shift is more than 1  Or more explicitly  if the CPU has no bitshift operators  it will be a MUL DIV anyway  but if the CPU has bitshift operators  you avoid a microcode branch and this is a few instructions less   I am writing some code right now that requires a lot of doubling halving operations because it is working on a dense binary tree  and there is one more operation that I suspect might be more optimal than an addition - a left  power of two multiply  shift with an addition  This can be replaced with a left shift and an xor if the shift is wider than the number of bits you want to add  easy example is  i lt  lt 1  1  which adds one to a doubled value  This does not of course apply to a right shift  power of two divide  because only a left  little endian  shift fills the gap with zeros   In my code  these multiply divide by two and powers of two operations are very intensively used and because the formulae are quite short already  each instruction that can be eliminated can be a substantial gain  If the processor does not support these bitshift operators  no gain will happen but neither will there be a loss   Also  in the algorithms I am writing  they visually represent the movements that occur so in that sense they are in fact more clear  The left hand side of a binary tree is bigger  and the right is smaller  As well as that  in my code  odd and even numbers have a special significance  and all left-hand children in the tree are odd and all right hand children  and the root  are even  In some cases  which I haven t encountered yet  but may  oh  actually  I didn t even think of this  x amp 1 may be a more optimal operation compared to x 2  x amp 1 on an even number will produce zero  but will produce 1 for an odd number   Going a bit further than just odd even identification  if I get zero for x amp 3 I know that 4 is a factor of our number  and same for x 7 for 8  and so on  I know that these cases have probably got limited utility but it s nice to know that you can avoid a modulus operation and use a bitwise logic operation instead  because bitwise operations are almost always the fastest  and least likely to be ambiguous to the compiler   I am pretty much inventing the field of dense binary trees so I expect that people may not grasp the value of this comment  as very rarely do people want to only perform factorisations on only powers of two  or only multiply divide powers of two

User · Answer

Just tried on my machine compiling this    int a        int b   a   10    When disassembling it produces output    MOV EAX DWORD PTR SS  ESP 1C    Move a into EAX LEA EAX DWORD PTR DS  EAX EAX 4    Multiply by 5 without shift   SHL EAX  1   Multiply by 2 using shift   This version is faster than your hand-optimized code with pure shifting and addition   You really never know what the compiler is going to come up with  so it s better to simply write a normal multiplication and let him optimize the way he wants to  except in very precise cases where you know the compiler cannot optimize

User · Answer

Whether it is actually faster depends on the hardware and compiler actually used

User · Answer

If you compare output for x x   x 2 and x lt  lt 1 syntax on a gcc compiler  then you would get the same result in x86 assembly   https   godbolt org z JLpp0j          push    rbp         mov     rbp  rsp         mov     DWORD PTR  rbp-4   edi         mov     eax  DWORD PTR  rbp-4          add     eax  eax         pop     rbp         ret   So you can consider gcc as smart enought to determine his own best solution independently from what you typed

User · Answer

Python test performing same multiplication 100 million times against the same random numbers    gt  gt  gt  from timeit import timeit  gt  gt  gt  setup str    import scipy  from scipy import random  scipy random seed 0    gt  gt  gt  N   10 1000 1000  gt  gt  gt  timeit  x random randint 65536     setup setup str  number N  1 894096851348877   Time from generating the random  s and no opperati   gt  gt  gt  timeit  x random randint 65536   x 2   setup setup str  number N  2 2799630165100098  gt  gt  gt  timeit  x random randint 65536   x  lt  lt  1   setup setup str  number N  2 2616429328918457   gt  gt  gt  timeit  x random randint 65536   x 10   setup setup str  number N  2 2799630165100098  gt  gt  gt  timeit  x random randint 65536    x  lt  lt  3     x lt  lt 1    setup setup str  number N  2 9485139846801758   gt  gt  gt  timeit  x random randint 65536   x    2   setup setup str  number N  2 490908145904541  gt  gt  gt  timeit  x random randint 65536   x   2   setup setup str  number N  2 4757170677185059  gt  gt  gt  timeit  x random randint 65536   x  gt  gt  1   setup setup str  number N  2 2316000461578369   So in doing a shift rather than multiplication division by a power of two in python  there s a slight improvement   10  for division   1  for multiplication    If its a non-power of two  there s likely a considerable slowdown   Again these  s will change depending on your processor  your compiler  or interpreter -- did in python for simplicity    As with everyone else  don t prematurely optimize   Write very readable code  profile if its not fast enough  and then try to optimize the slow parts   Remember  your compiler is much better at optimization than you are

User · Answer

Just a concrete point of measure  many years back  I benchmarked two versions of my hashing algorithm   unsigned hash  char const  s         unsigned h   0      while    s      0              h   127   h    unsigned char  s             s            return h      and  unsigned hash  char const  s         unsigned h   0      while    s      0              h    h  lt  lt  7  - h    unsigned char  s             s            return h      On every machine I benchmarked it on  the first was at least as fast as the second   Somewhat surprisingly  it was sometimes faster  e g  on a Sun Sparc    When the hardware didn t support fast multiplication  and most didn t back then   the compiler would convert the multiplication into the appropriate combinations of shifts and add sub   And because it knew the final goal  it could sometimes do so in less instructions than when you explicitly wrote the shifts and the add subs   Note that this was something like 15 years ago   Hopefully  compilers have only gotten better since then  so you can pretty much count on the compiler doing the right thing  probably better than you could    Also  the reason the code looks so C ish is because it was over 15 years ago  I d obviously use std  string and iterators today

User · Answer

There are optimizations the compiler can t do because they only work for a reduced set of inputs     Below there is c   sample code that can do a faster division doing a 64bits  Multiplication by the reciprocal   Both numerator and denominator must be below certain threshold  Note that it must be compiled to use 64 bits instructions to be actually faster than normal division    include  lt stdio h gt   include  lt chrono gt   static const unsigned s bc   32  static const unsigned long long s p   1ULL  lt  lt  s bc  static const unsigned long long s hp   s p   2   static unsigned long long s f  static unsigned long long s fr   static void fastDivInitialize const unsigned d        s f   s p   d      s fr   s f    s p -  s f   d       static unsigned fastDiv const unsigned n        return  s f   n     s fr   n   s hp   gt  gt  s bc    gt  gt  s bc     static bool fastDivCheck const unsigned n  const unsigned d           32 to 64 cycles latency on modern cpus     const unsigned expected   n   d          At least 10 cycles latency on modern cpus     const unsigned result   fastDiv n        if  result    expected                printf  Failed for   u  u     u n   n  d  expected           return false             return true     int main         unsigned result   0          Make sure to verify it works for your expected set of inputs     const unsigned MAX N   65535      const unsigned MAX D   40000       const double ONE SECOND COUNT   1000000000 0       auto t0   std  chrono  steady clock  now        unsigned count   0      printf  Verifying    n        for  unsigned d   1  d  lt   MAX D    d                fastDivInitialize d           for  unsigned n   0  n  lt   MAX N    n                        count     fastDivCheck n  d                       auto t1   std  chrono  steady clock  now        printf  Errors   u    u    4fs  n   count  MAX D    MAX N   1    t1 - t0  count     ONE SECOND COUNT        t0   t1      for  unsigned d   1  d  lt   MAX D    d                fastDivInitialize d           for  unsigned n   0  n  lt   MAX N    n                        result    fastDiv n                       t1   std  chrono  steady clock  now        printf  Fast division time    4fs n    t1 - t0  count     ONE SECOND COUNT        t0   t1      count   0      for  unsigned d   1  d  lt   MAX D    d                for  unsigned n   0  n  lt   MAX N    n                        result    n   d                      t1   std  chrono  steady clock  now        printf  Normal division time    4fs n    t1 - t0  count     ONE SECOND COUNT        getchar        return result

User · Answer

I too wanted to see if I could Beat the House  this is a more general bitwise for any-number by any number multiplication  the macros I made are about 25  more to twice as slower than normal   multiplication  as said by others if it s close to a multiple of 2 or made up of few multiples of 2 you might win  like X 23 made up of  X lt  lt 4   X lt  lt 2   X lt  lt 1  X is going to be slower then X 65 made up of  X lt  lt 6  X     include  lt stdio h gt   include  lt time h gt    define MULTIPLYINTBYMINUS X Y   -  X  gt  gt  30   amp  1  amp  Y lt  lt 30    -  X  gt  gt  29   amp  1  amp  Y lt  lt 29    -  X  gt  gt  28   amp  1  amp  Y lt  lt 28    -  X  gt  gt  27   amp  1  amp  Y lt  lt 27    -  X  gt  gt  26   amp  1  amp  Y lt  lt 26    -  X  gt  gt  25   amp  1  amp  Y lt  lt 25    -  X  gt  gt  24   amp  1  amp  Y lt  lt 24    -  X  gt  gt  23   amp  1  amp  Y lt  lt 23    -  X  gt  gt  22   amp  1  amp  Y lt  lt 22    -  X  gt  gt  21   amp  1  amp  Y lt  lt 21    -  X  gt  gt  20   amp  1  amp  Y lt  lt 20    -  X  gt  gt  19   amp  1  amp  Y lt  lt 19    -  X  gt  gt  18   amp  1  amp  Y lt  lt 18    -  X  gt  gt  17   amp  1  amp  Y lt  lt 17    -  X  gt  gt  16   amp  1  amp  Y lt  lt 16    -  X  gt  gt  15   amp  1  amp  Y lt  lt 15    -  X  gt  gt  14   amp  1  amp  Y lt  lt 14    -  X  gt  gt  13   amp  1  amp  Y lt  lt 13    -  X  gt  gt  12   amp  1  amp  Y lt  lt 12    -  X  gt  gt  11   amp  1  amp  Y lt  lt 11    -  X  gt  gt  10   amp  1  amp  Y lt  lt 10    -  X  gt  gt  9   amp  1  amp  Y lt  lt 9    -  X  gt  gt  8   amp  1  amp  Y lt  lt 8    -  X  gt  gt  7   amp  1  amp  Y lt  lt 7    -  X  gt  gt  6   amp  1  amp  Y lt  lt 6    -  X  gt  gt  5   amp  1  amp  Y lt  lt 5    -  X  gt  gt  4   amp  1  amp  Y lt  lt 4    -  X  gt  gt  3   amp  1  amp  Y lt  lt 3    -  X  gt  gt  2   amp  1  amp  Y lt  lt 2    -  X  gt  gt  1   amp  1  amp  Y lt  lt 1    -  X  gt  gt  0   amp  1  amp  Y lt  lt 0    define MULTIPLYINTBYSHIFT X Y       X  gt  gt  30   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 30        X  gt  gt  29   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 29        X  gt  gt  28   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 28        X  gt  gt  27   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 27        X  gt  gt  26   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 26        X  gt  gt  25   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 25        X  gt  gt  24   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 24        X  gt  gt  23   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 23        X  gt  gt  22   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 22        X  gt  gt  21   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 21        X  gt  gt  20   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 20        X  gt  gt  19   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 19        X  gt  gt  18   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 18        X  gt  gt  17   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 17        X  gt  gt  16   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 16        X  gt  gt  15   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 15        X  gt  gt  14   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 14        X  gt  gt  13   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 13        X  gt  gt  12   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 12        X  gt  gt  11   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 11        X  gt  gt  10   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 10        X  gt  gt  9   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 9        X  gt  gt  8   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 8        X  gt  gt  7   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 7        X  gt  gt  6   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 6        X  gt  gt  5   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 5        X  gt  gt  4   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 4        X  gt  gt  3   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 3        X  gt  gt  2   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 2        X  gt  gt  1   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 1        X  gt  gt  0   amp  1  lt  lt 31  gt  gt 31  amp  Y lt  lt 0   int main         int randomnumber 23      int randomnumber2 23      int checknum 23      clock t start  diff      srand time 0        start   clock        for int i 0 i lt 1000000 i                  randomnumber   rand     10000          randomnumber2   rand     10000          checknum MULTIPLYINTBYMINUS randomnumber randomnumber2           if  checknum  randomnumber randomnumber2                        printf  s  i and  i and  i  checknum randomnumber randomnumber2                       diff   clock   - start      int msec   diff   1000   CLOCKS PER SEC      printf  MULTIPLYINTBYMINUS Time  d milliseconds   msec       start   clock        for int i 0 i lt 1000000 i                  randomnumber   rand     10000          randomnumber2   rand     10000          checknum MULTIPLYINTBYSHIFT randomnumber randomnumber2           if  checknum  randomnumber randomnumber2                        printf  s  i and  i and  i  checknum randomnumber randomnumber2                       diff   clock   - start      msec   diff   1000   CLOCKS PER SEC      printf  MULTIPLYINTBYSHIFT Time  d milliseconds   msec       start   clock        for int i 0 i lt 1000000 i                  randomnumber   rand     10000          randomnumber2   rand     10000          checknum  randomnumber randomnumber2          if  checknum  randomnumber randomnumber2                        printf  s  i and  i and  i  checknum randomnumber randomnumber2                       diff   clock   - start      msec   diff   1000   CLOCKS PER SEC      printf  normal   Time  d milliseconds   msec       return 0

User · Answer

Shifting is generally a lot faster than multiplying at an instruction level but you may well be wasting your time doing premature optimisations  The compiler may well perform these optimisations at compiletime  Doing it yourself will affect readability and possibly have no effect on performance  It s probably only worth it to do things like this if you have profiled and found this to be a bottleneck   Actually the division trick  known as  magic division  can actually yield huge payoffs  Again you should profile first to see if it s needed  But if you do use it there are useful programs around to help you figure out what instructions are needed for the same division semantics  Here is an example   http   www masm32 com board index php topic 12421 0  An example which I have lifted from the OP s thread on MASM32   include ConstDiv inc     mov eax 9999999   divide eax by 100000 cdiv 100000   edx   quotient   Would generate   mov eax 9999999 mov edx 0A7C5AC47h add eax 1  if  CARRY      mul edx  endif shr edx 16

User · Answer

This depends on the processor and the compiler  Some compilers already optimize code this way  others don t  So you need to check each time your code needs to be optimized this way   Unless you desperately need to optimize  I would not scramble my source code just to save an assembly instruction or processor cycle

User · Answer

Is it actually faster to use say  i lt  lt 3   i lt  lt 1  to multiply with 10 than using i 10 directly    It might or might not be on your machine - if you care  measure in your real-world usage   A case study - from 486 to core i7  Benchmarking is very difficult to do meaningfully  but we can look at a few facts   From http   www penguin cz  literakl intel s html SAL and    http   www penguin cz  literakl intel i html IMUL we get an idea of x86 clock cycles needed for arithmetic shift and multiplication   Say we stick to  486   the newest one listed   32 bit registers and immediates  IMUL takes 13-42 cycles and IDIV 44   Each SAL takes 2  and adding 1  so even with a few of those together shifting superficially looks like a winner   These days  with the core i7    from http   software intel com en-us forums showthread php t 61481      The latency is 1 cycle for an integer addition and 3 cycles for an integer multiplication  You can find the latencies and thoughput in Appendix C of the  Intel   64 and IA-32 Architectures Optimization Reference Manual   which is located on http   www intel com products processor manuals      from some Intel blurb      Using SSE  the Core i7 can issue simultaneous add and multiply instructions  resulting in a peak rate of 8 floating-point operations  FLOP  per clock cycle   That gives you an idea of how far things have come   The optimisation trivia - like bit shifting versus   - that was been taken seriously even into the 90s is just obsolete now   Bit-shifting is still faster  but for non-power-of-two mul div by the time you do all your shifts and add the results it s slower again   Then  more instructions means more cache faults  more potential issues in pipelining  more use of temporary registers may mean more saving and restoring of register content from the stack    it quickly gets too complicated to quantify all the impacts definitively but they re predominantly negative   functionality in source code vs implementation  More generally  your question is tagged C and C     As 3rd generation languages  they re specifically designed to hide the details of the underlying CPU instruction set   To satisfy their language Standards  they must support multiplication and shifting operations  and many others  even if the underlying hardware doesn t   In such cases  they must synthesize the required result using many other instructions   Similarly  they must provide software support for floating point operations if the CPU lacks it and there s no FPU   Modern CPUs all support   and  lt  lt   so this might seem absurdly theoretical and historical  but the significance thing is that the freedom to choose implementation goes both ways  even if the CPU has an instruction that implements the operation requested in the source code in the general case  the compiler s free to choose something else that it prefers because it s better for the specific case the compiler s faced with   Examples  with a hypothetical assembly language   source           literal approach         optimised approach  define N 0 int x             word x                xor registerA  registerA x    N           move x - gt  registerA                  move x - gt  registerB                  A   B   immediate 0                   store registerA - gt  x                  do something more with x                  Instructions like exclusive or  xor  have no relationship to the source code  but xor-ing anything with itself clears all the bits  so it can be used to set something to 0   Source code that implies memory addresses may not entail any being used   These kind of hacks have been used for as long as computers have been around   In the early days of 3GLs  to secure developer uptake the compiler output had to satisfy the existing hardcore hand-optimising assembly-language dev  community that the produced code wasn t slower  more verbose or otherwise worse   Compilers quickly adopted lots of great optimisations - they became a better centralised store of it than any individual assembly language programmer could possibly be  though there s always the chance that they miss a specific optimisation that happens to be crucial in a specific case - humans can sometimes nut it out and grope for something better while compilers just do as they ve been told until someone feeds that experience back into them   So  even if shifting and adding is still faster on some particular hardware  then the compiler writer s likely to have worked out exactly when it s both safe and beneficial   Maintainability  If your hardware changes you can recompile and it ll look at the target CPU and make another best choice  whereas you re unlikely to ever want to revisit your  optimisations  or list which compilation environments should use multiplication and which should shift   Think of all the non-power-of-two bit-shifted  optimisations  written 10  years ago that are now slowing down the code they re in as it runs on modern processors      Thankfully  good compilers like GCC can typically replace a series of bitshifts and arithmetic with a direct multiplication when any optimisation is enabled  i e     main        return  argc  lt  lt  4     argc  lt  lt  2    argc    -  imull    21  8  ebp    eax  so a recompilation may help even without fixing the code  but that s not guaranteed   Strange bitshifting code implementing multiplication or division is far less expressive of what you were conceptually trying to achieve  so other developers will be confused by that  and a confused programmer s more likely to introduce bugs or remove something essential in an effort to restore seeming sanity   If you only do non-obvious things when they re really tangibly beneficial  and then document them well  but don t document other stuff that s intuitive anyway   everyone will be happier   General solutions versus partial solutions  If you have some extra knowledge  such as that your int will really only be storing values x  y and z  then you may be able to work out some instructions that work for those values and get you your result more quickly than when the compiler s doesn t have that insight and needs an implementation that works for all int values   For example  consider your question      Multiplication and division can be achieved using bit operators      You illustrate multiplication  but how about division   int x  x  gt  gt  1       divide by 2    According to the C   Standard 5 8      -3- The value of E1    E2 is E1 right-shifted E2 bit positions  If E1 has an unsigned type or if E1 has a signed type and a nonnegative value  the value of the result is the integral part of the quotient of E1 divided by the quantity 2 raised to the power E2  If E1 has a signed type and a negative value  the resulting value is implementation-defined     So  your bit shift has an implementation defined result when x is negative  it may not work the same way on different machines   But    works far more predictably    It may  not be perfectly consistent either  as different machines may have different representations of negative numbers  and hence different ranges even when there are the same number of bits making up the representation    You may say  I don t care    that int is storing the age of the employee  it can never be negative    If you have that kind of special insight  then yes - your  gt  gt  safe optimisation might be passed over by the compiler unless you explicitly do it in your code   But  it s risky and rarely useful as much of the time you won t have this kind of insight  and other programmers working on the same code won t know that you ve bet the house on some unusual expectations of the data you ll be handling    what seems a totally safe change to them might backfire because of your  optimisation       Is there any sort of input that can t be multiplied or divided in this way    Yes    as mentioned above  negative numbers have implementation defined behaviour when  divided  by bit-shifting

User · Answer

It completely depends on target device  language  purpose  etc    Pixel crunching in a video card driver  Very likely  yes    NET business application for your department  Absolutely no reason to even look into it    For a high performance game for a mobile device it might be worth looking into  but only after easier optimizations have been performed

User · Answer

Don t do unless you absolutely need to and your code intent requires shifting rather than multiplication division   In typical day - you could potentialy save few machine cycles  or loose  since compiler knows better what to optimize   but the cost doesn t worth it - you spend time on minor details rather than actual job  maintaining the code becomes harder and your co-workers will curse you    You might need to do it for high-load computations  where each saved cycle means minutes of runtime  But  you should optimize one place at a time and do performance tests each time to see if you really made it faster or broke compilers logic

User · Answer

Short answer   Not likely   Long answer  Your compiler has an optimizer in it that knows how to multiply as quickly as your target processor architecture is capable   Your best bet is to tell the compiler your intent clearly  i e  i 2 rather than i  lt  lt  1  and let it decide what the fastest assembly machine code sequence is   It s even possible that the processor itself has implemented the multiply instruction as a sequence of shifts  amp  adds in microcode   Bottom line--don t spend a lot of time worrying about this   If you mean to shift  shift   If you mean to multiply  multiply   Do what is semantically clearest--your coworkers will thank you later   Or  more likely  curse you later if you do otherwise

User · Answer

In addition to all the other good answers here  let me point out another reason to not use shift when you mean divide or multiply  I have never once seen someone introduce a bug by forgetting the relative precedence of multiplication and addition  I have seen bugs introduced when maintenance programmers forgot that  multiplying  via a shift is logically a multiplication but not syntactically of the same precedence as multiplication  x   2   z and x  lt  lt  1   z are very different   If you re working on numbers then use arithmetic operators like   -        If you re working on arrays of bits  use bit twiddling operators like  amp       gt  gt    Don t mix them  an expression that has both bit twiddling and arithmetic is a bug waiting to happen

User · Answer

In the case of signed integers and right shift vs division  it can make a difference  For negative numbers  the shift rounds rounds towards negative infinity whereas division rounds towards zero  Of course the compiler will change the division to something cheaper  but it will usually change it to something that has the same rounding behavior as division  because it is either unable to prove that the variable won t be negative or it simply doesn t care  So if you can prove that a number won t be negative or if you don t care which way it will round  you can do that optimization in a way that is more likely to make a difference

User · Answer

Shift and integer multiply instructions have similar performance on most modern CPUs - integer multiply instructions were relatively slow back in the 1980s but in general this is no longer true  Integer multiply instructions may have higher latency  so there may still be cases where a shift is preferable  Ditto for cases where you can keep more execution units busy  although this can cut both ways    Integer division is still relatively slow though  so using a shift instead of division by a power of 2 is still a win  and most compilers will implement this as an optimisation  Note however that for this optimisation to be valid the dividend needs to be either unsigned or must be known to be positive  For a negative dividend the shift and divide are not equivalent    include  lt stdio h gt   int main void        int i       for  i   5  i  gt   -5  --i                printf   d   2    d   d  gt  gt  1    d n   i  i   2  i  i  gt  gt  1             return 0      Output   5   2   2  5  gt  gt  1   2 4   2   2  4  gt  gt  1   2 3   2   1  3  gt  gt  1   1 2   2   1  2  gt  gt  1   1 1   2   0  1  gt  gt  1   0 0   2   0  0  gt  gt  1   0 -1   2   0  -1  gt  gt  1   -1 -2   2   -1  -2  gt  gt  1   -1 -3   2   -1  -3  gt  gt  1   -2 -4   2   -2  -4  gt  gt  1   -2 -5   2   -2  -5  gt  gt  1   -3   So if you want to help the compiler then make sure the variable or expression in the dividend is explicitly unsigned

[c++] Is multiplication and division using shift operators in C actually faster?

Examples related to c++

Examples related to c

Examples related to division

Examples related to multiplication

Examples related to bit-shift