What is the fastest most efficient way to find the highest set bit msb in an integer in C

Question

If I have some integer n  and I want to know the position of the most significant bit  that is  if the least significant bit is on the right  I want to know the position of the furthest left bit that is a 1   what is the quickest most efficient method of finding out   I know that POSIX supports a ffs   method in strings h to find the first set bit  but there doesn t seem to be a corresponding fls   method   Is there some really obvious way of doing this that I m missing   What about in cases where you can t use POSIX functions for portability   Edit  What about a solution that works on both 32 and 64 bit architectures  many of the code listings seem like they d only work on 32 bit ints

User · Answer

A version in C using successive approximation:

unsigned int getMsb(unsigned int n)
{
  unsigned int msb  = sizeof(n) * 4;
  unsigned int step = msb;
  while (step > 1)
 {
    step /=2;
    if (n>>msb)
     msb += step;
   else
     msb -= step;
 }
  if (n>>msb)
    msb++;
  return (msb - 1);
}

Advantage: the running time is constant regardless of the provided number, as the number of loops are always the same. ( 4 loops when using "unsigned int")

User · Answer

The code          x gt  1      unsigned func unsigned x        double d   x       int p    reinterpret cast lt long long  gt   amp d   gt  gt  52  - 1023      printf   The left-most non zero bit of  d is bit  d n   x  p           Or get the integer part of FPU instruction FYL2X  Y Log2 X  by setting Y 1

User · Answer

Although I would probably only use this method if I absolutely required the best possible performance  e g  for writing some sort of board game AI involving bitboards   the most efficient solution is to use inline ASM  See the Optimisations section of this blog post for code with an explanation             the bsrl assembly instruction computes the position of the most significant bit  Thus  we could use this asm statement   asm   bsrl  1   0            r   position           r   number

User · Answer

Another poster provided a lookup-table using a byte-wide lookup  In case you want to eke out a bit more performance  at the cost of 32K of memory instead of just 256 lookup entries  here is a solution using a 15-bit lookup table  in C  7 for  NET   The interesting part is initializing the table  Since it s a relatively small block that we want for the lifetime of the process  I allocate unmanaged memory for this by using Marshal AllocHGlobal  As you can see  for maximum performance  the whole example is written as native   readonly static byte   msb tab 15      Initialize a table of 32768 bytes with the bit position  counting from LSB 0     of the highest  set   non-zero  bit of its corresponding 16-bit index value     The table is compressed by half  so use  value  gt  gt  1  for indexing  static MyStaticInit         var p   new byte 0x8000        for  byte n   0  n  lt  16  n            for  int c    1  lt  lt  n   gt  gt  1  i   0  i  lt  c  i                p c   i    n       msb tab 15   p      The table requires one-time initialization via the code above  It is read-only so a single global copy can be shared for concurrent access  With this table you can quickly look up the integer log2  which is what we re looking for here  for all the various integer widths  8  16  32  and 64 bits     Notice that the table entry for 0  the sole integer for which the notion of  highest set bit  is undefined  is given the value -1  This distinction is necessary for proper handling of 0-valued upper words in the code below  Without further ado  here is the code for each of the various integer primitives   ulong  64-bit  Version       lt summary gt  Index of the highest set bit in  v   or -1 for value  0   lt  summary gt  public static int HighestOne this ulong v        if   long v  lt   0          return  int   v  gt  gt  57   amp  0x40  - 1          handles cases v  0 and MSB  63      int j         int   0xFFFFFFFFU - v          gt  gt  58   amp  0x20      j             int   0x0000FFFFU -  v  gt  gt  j    gt  gt  59   amp  0x10      return j   msb tab 15 v  gt  gt   j   1        uint  32-bit  Version       lt summary gt  Index of the highest set bit in  v   or -1 for value  0   lt  summary gt  public static int HighestOne uint v        if   int v  lt   0          return  int   v  gt  gt  26   amp  0x20  - 1         handles cases v  0 and MSB  31      int j    int   0x0000FFFFU - v   gt  gt  27   amp  0x10      return j   msb tab 15 v  gt  gt   j   1        Various overloads for the above  public static int HighestOne long v    gt  HighestOne  ulong v   public static int HighestOne int v    gt  HighestOne  uint v   public static int HighestOne ushort v    gt  msb tab 15 v  gt  gt  1   public static int HighestOne short v    gt  msb tab 15  ushort v  gt  gt  1   public static int HighestOne char ch    gt  msb tab 15 ch  gt  gt  1   public static int HighestOne sbyte v    gt  msb tab 15  byte v  gt  gt  1   public static int HighestOne byte v    gt  msb tab 15 v  gt  gt  1     This is a complete  working solution which represents the best performance on  NET 4 7 2 for numerous alternatives that I compared with a specialized performance test harness  Some of these are mentioned below  The test parameters were a uniform density of all 65 bit positions  i e   0     31 63 plus value 0  which produces result -1   The bits below the target index position were filled randomly  The tests were x64 only  release mode  with JIT-optimizations enabled     That s the end of my formal answer here  what follows are some casual notes and links to source code for alternative test candidates associated with the testing I ran to validate the performance and correctness of the above code   The version provided above above  coded as Tab16A was a consistent winner over many runs  These various candidates  in active working scratch form  can be found here  here  and here     1  candidates HighestOne Tab16A               622 496  2  candidates HighestOne Tab16C               628 234  3  candidates HighestOne Tab8A                649 146  4  candidates HighestOne Tab8B                656 847  5  candidates HighestOne Tab16B               657 147  6  candidates HighestOne Tab16D               659 650  7   highest one bit UNMANAGED HighestOne U    702 900  8  de Bruijn IndexOfMSB                       709 672  9   old 2 HighestOne Old2                     715 810 10   test A HighestOne8                        757 188 11   old 1 HighestOne Old1                     757 925 12   test A HighestOne5   unsafe               760 387 13   test B HighestOne8   unsafe               763 904 14   test A HighestOne3   unsafe               766 433 15   test A HighestOne1   unsafe               767 321 16   test A HighestOne4   unsafe               771 702 17   test B HighestOne2   unsafe               772 136 18   test B HighestOne1   unsafe               772 527 19   test B HighestOne3   unsafe               774 140 20   test A HighestOne7   unsafe               774 581 21   test B HighestOne7   unsafe               775 463 22   test A HighestOne2   unsafe               776 865 23  candidates HighestOne NoTab                777 698 24   test B HighestOne6   unsafe               779 481 25   test A HighestOne6   unsafe               781 553 26   test B HighestOne4   unsafe               785 504 27   test B HighestOne5   unsafe               789 797 28   test A HighestOne0   unsafe               809 566 29   test B HighestOne0   unsafe               814 990 30   highest one bit HighestOne                824 345 30   bitarray ext RtlFindMostSignificantBit    894 069 31  candidates HighestOne Naive                898 865   Notable is that the terrible performance of  ntdll dll RtlFindMostSignificantBit via P Invoke    DllImport  ntdll dll    SuppressUnmanagedCodeSecurity  SecuritySafeCritical  public static extern int RtlFindMostSignificantBit ulong ul     It s really too bad  because here s the entire actual function       RtlFindMostSignificantBit          bsr rdx  rcx           mov eax 0FFFFFFFFh           movzx ecx  dl           cmovne      eax ecx           ret   I can t imagine the poor performance originating with these five lines  so the managed native transition penalties must be to blame  I was also surprised that the testing really favored the 32KB  and 64KB  short  16-bit  direct-lookup tables over the 128-byte  and 256-byte  byte  8-bit  lookup tables  I thought the following would be more competitive with the 16-bit lookups  but the latter consistently outperformed this   public static int HighestOne Tab8A ulong v        if   long v  lt   0          return  int   v  gt  gt  57   amp  64  - 1       int j      j          int   0xFFFFFFFFU - v   gt  gt  58   amp  32      j          int   0x0000FFFFU -  v  gt  gt  j    gt  gt  59   amp  16      j          int   0x000000FFU -  v  gt  gt  j    gt  gt  60   amp  8      return j   msb tab 8 v  gt  gt  j       The last thing I ll point out is that I was quite shocked that my deBruijn method didn t fare better  This is the method that I had previously been using pervasively   const ulong N bsf64   0x07EDD5E59A4E28C2              N bsr64   0x03F79D71B4CB0A89   readonly public static sbyte   bsf64         63   0  58   1  59  47  53   2  60  39  48  27  54  33  42   3      61  51  37  40  49  18  28  20  55  30  34  11  43  14  22   4      62  57  46  52  38  26  32  41  50  36  17  19  29  10  13  21      56  45  25  31  35  16   9  12  44  24  15   8  23   7   6   5     bsr64          0  47   1  56  48  27   2  60  57  49  41  37  28  16   3  61      54  58  35  52  50  42  21  44  38  32  29  23  17  11   4  62      46  55  26  59  40  36  15  53  34  51  20  43  31  22  10  45      25  39  14  33  19  30   9  24  13  18   8  12   7   6   5  63      public static int IndexOfLSB ulong v    gt      v    0   bsf64   v  amp   ulong - long v    N bsf64   gt  gt  58    -1   public static int IndexOfMSB ulong v        if   long v  lt   0          return  int   v  gt  gt  57   amp  64  - 1       v    v  gt  gt  1  v    v  gt  gt  2   v    v  gt  gt  4       does anybody know a better     v    v  gt  gt  8  v    v  gt  gt  16  v    v  gt  gt  32      way than these 12 ops      return bsr64  v   N bsr64   gt  gt  58       There s much discussion of how superior and great deBruijn methods at this SO question  and I had tended to agree  My speculation is that  while both the deBruijn and direct lookup table methods  that I found to be fastest  both have to do a table lookup  and both have very minimal branching  only the deBruijn has a 64-bit multiply operation  I only tested the IndexOfMSB functions here--not the deBruijn IndexOfLSB--but I expect the latter to fare much better chance since it has so many fewer operations  see above   and I ll likely continue to use it for LSB

User · Answer

Note that what you are trying to do is calculate the integer log2 of an integer    include  lt stdio h gt   include  lt stdlib h gt   unsigned int Log2 unsigned long x        unsigned long n   x      int bits   sizeof x  8      int step   1  int k 0      for  step   1  step  lt  bits              n     n  gt  gt  step           step    2    k              printf   ld  ld n  x   x -  n  gt  gt  1          return x -  n  gt  gt  1        Observe that you can attempt to search more than 1 bit at a time   unsigned int Log2 a unsigned long x        unsigned long n   x      int bits   sizeof x  8      int step   1      int step2   0        observe that you can move 8 bits at a time  and there is a pattern          if  x gt 1 lt  lt step2 8     step2  8            if  x gt 1 lt  lt step2 8     step2  8                if  x gt 1 lt  lt step2 8     step2  8                                          for  step2 0  x gt 1L lt  lt step2 8              step2  8              printf  step2  d n  step2       for  step   0  x gt 1L lt  lt  step step2               step  1            printf  step  d n  step step2             printf  log2  ld   d n  x step step2       return step step2       This approach uses a binary search  unsigned int Log2 b unsigned long x        unsigned long n   x      unsigned int bits   sizeof x  8      unsigned int hbit   bits-1      unsigned int lbit   0      unsigned long guess   bits 2      int found   0       while   hbit-lbit gt 1               printf  log2  ld   d lt  d lt  d n  x lbit guess hbit             when value between guess  lbit         if   x lt   1L lt  lt guess                    printf   ld  lt  1 lt  lt  d  ld n  x guess 1L lt  lt guess               hbit guess              guess  hbit lbit  2                printf  log2  ld   d lt  d lt  d n  x lbit guess hbit                       when value between hbit  guess           else         if   x gt  1L lt  lt guess                     printf   ld  gt  1 lt  lt  d  ld n  x guess 1L lt  lt guess               lbit guess              guess  hbit lbit  2                printf  log2  ld   d lt  d lt  d n  x lbit guess hbit                       if   x gt  1L lt  lt guess       guess      printf  log2 x ld  r d n  x guess       return guess       Another binary search method  perhaps more readable   unsigned int Log2 c unsigned long x        unsigned long v   x      unsigned int bits   sizeof x  8      unsigned int step   bits      unsigned int res   0      for  step   bits 2  step gt 0                    printf  log2  ld  v  d  gt  gt  step  d    ld n  x v step v gt  gt step           while   v gt  gt step                 v gt  gt  step              res  step                printf  log2  ld  step  d res  d v gt  gt step  ld n  x step res v                     step    2            if   x gt  1L lt  lt res       res      printf  log2 x ld  r ld n  x res       return res       And because you will want to test these   int main         unsigned long int x   3      for  x 2  x lt 1000000000  x  2               printf  x  ld  x 1  ld  log2 x 1   d n  x x 1 Log2 x 1            printf  x  ld  x 1  ld  log2 a x 1   d n  x x 1 Log2 a x 1            printf  x  ld  x 1  ld  log2 b x 1   d n  x x 1 Log2 b x 1            printf  x  ld  x 1  ld  log2 c x 1   d n  x x 1 Log2 c x 1              return 0

User · Answer

Use a combination of VPTEST D  W  B  and PSRLDQ instructions to focus in on the byte containing the most significant bit as shown below using an emulation of these instructions in Perl found at  https   github com philiprbrenan SimdAvx512 if  1                                                                            TpositionOfMostSignificantBitIn64   my  m                                                                           Test strings  B0       1       2       3       4       5       6       7  b0123456701234567012345670123456701234567012345670123456701234567   0000000000000000000000000000000000000000000000000000000000000000     0000000000000000000000000000000000000000000000000000000000000001     0000000000000000000000000000000000000000000000000000000000000010     0000000000000000000000000000000000000000000000000000000000000111     0000000000000000000000000000000000000000000000000000001010010000     0000000000000000000000000000000000001000000001100100001010010000     0000000000000000000001001000010000000000000001100100001010010000     0000000000000000100000000000000100000000000001100100001010010000     1000000000000000100000000000000100000000000001100100001010010000        my  n    0  1  2  3  10  28  43  48  64                                         Expected positions of msb    sub positionOfMostSignificantBitIn64                                            Find the position of the most significant bit in a string of 64 bits starting from 1 for the least significant bit or return 0 if the input field is all zeros     my   s64                                                                      String of 64 bits      my  N   128                                                                   128 bit operations     my  f   0                                                                     Position of first bit set     my  x    0 x N                                                                Double Quad Word set to 0     my  s   substr  x  s64  - N                                                   128 bit area needed      substr VPTESTMD  s   s   -2  1  eq  1      s   PSRLDQ  s  4      f    32      Test 2 dwords     substr VPTESTMW  s   s   -2  1  eq  1      s   PSRLDQ  s  2      f    16      Test 2 words     substr VPTESTMB  s   s   -2  1  eq  1      s   PSRLDQ  s  1      f     8      Test 2 bytes       s   substr  s  -8                                                            Last byte remaining       s  lt          f   last for                                                     Search remaing byte       qw 10000000 01000000 00100000 00010000          00001000 00000100 00000010 00000001         64 -  f                                                                       Position of first bit set         ok  n     eq positionOfMostSignificantBitIn64  m     for keys  m                Test

User · Answer

Expanding on Josh s benchmark    one can improve the clz as follows                     clz2                         define NUM OF HIGHESTBITclz2 a    a                                                        1U   lt  lt   sizeof unsigned  8-1    gt  gt    builtin clz a                         0    Regarding the asm  note that there are bsr and bsrl  this is the  long  version   the normal one might be a bit faster

User · Answer

GCC has    -- Built-in Function  int   builtin clz  unsigned int x       Returns the number of leading 0-bits in X  starting at the most      significant bit position   If X is 0  the result is undefined    -- Built-in Function  int   builtin clzl  unsigned long       Similar to    builtin clz   except the argument type is  unsigned      long     -- Built-in Function  int   builtin clzll  unsigned long long       Similar to    builtin clz   except the argument type is  unsigned      long long    I d expect them to be translated into something reasonably efficient for your current platform  whether it be one of those fancy bit-twiddling algorithms  or a single instruction     A useful trick if your input can be zero is   builtin clz x   1   unconditionally setting the low bit without modifying any others makes the output 31 for x 0  without changing the output for any other input   To avoid needing to do that  your other option is platform-specific intrinsics like ARM GCC s   clz  no header needed   or x86 s  lzcnt u32 on CPUs that support the lzcnt instruction    Beware that lzcnt decodes as bsr on older CPUs instead of faulting  which gives 31-lzcnt for non-zero inputs    There s unfortunately no way to portably take advantage of the various CLZ instructions on non-x86 platforms that do define the result for input 0 as 32 or 64  according to the operand width    x86 s lzcnt does that  too  while bsr produces a bit-index that the compiler has to flip unless you use 31-  builtin clz x     The  undefined result  is not C Undefined Behavior  just a value that isn t defined   It s actually whatever was in the destination register when the instruction ran   AMD documents this  Intel doesn t  but Intel s CPUs do implement that behaviour   But it s not whatever was previously in the C variable you re assigning to  that s not usually how things work when gcc turns C into asm   See also Why does breaking the  quot output dependency quot  of LZCNT matter

User · Answer

This should be lightning fast   int msb unsigned int v      static const int pos 32     0  1  28  2  29  14  24  3      30  22  20  15  25  17  4  8  31  27  13  23  21  19      16  7  26  12  18  6  11  5  10  9     v    v  gt  gt  1    v    v  gt  gt  2    v    v  gt  gt  4    v    v  gt  gt  8    v    v  gt  gt  16    v    v  gt  gt  1    1    return pos  v   0x077CB531UL   gt  gt  27

User · Answer

c99 has given us log2  This removes the need for all the special sauce log2 implementations you see on this page  You can use the standard s log2 implementation like this   const auto n   13UL  const auto Index    unsigned long log2 n    printf  MSB is   u n   Index      Prints 3  zero offset    An n of 0UL needs to be guarded against as well  because      -8 is returned and FE DIVBYZERO is raised   I have written an example with that check that arbitrarily sets Index to ULONG MAX here  https   ideone com u26vsi    The visual-studio corollary to ephemient s gcc only answer is   const auto n   13UL  unsigned long Index    BitScanReverse  amp Index  n   printf  MSB is   u n   Index      Prints 3  zero offset    The documentation for  BitScanReverse states that Index is      Loaded with the bit position of the first set bit  1  found   In practice I ve found that if n is 0UL that Index is set to 0UL  just as it would be for an n of 1UL  But the only thing guaranteed in the documentation in the case of an n of 0UL is that the return is      0 if no set bits were found   Thus  similarly to the preferable log2 implementation above the return should be checked setting Index to a flagged value in this case  I ve again written an example of using ULONG MAX for this flag value here  http   rextester com GCU61409

User · Answer

Some overly complex answers here  The Debruin technique should only be used when the input is already a power of two  otherwise there s a better way  For a power of 2 input  Debruin is the absolute fastest  even faster than  BitScanReverse on any processor I ve tested  However  in the general case   BitScanReverse  or whatever the intrinsic is called in your compiler  is the fastest  on certain CPU s it can be microcoded though    If the intrinsic function is not an option  here is an optimal software solution for processing general inputs   u8  inline log2  u32 val         u8  k   0      if  val  gt  0x0000FFFFu    val  gt  gt   16  k    16        if  val  gt  0x000000FFu    val  gt  gt   8   k    8         if  val  gt  0x0000000Fu    val  gt  gt   4   k    4         if  val  gt  0x00000003u    val  gt  gt   2   k    2         k     val  amp  2   gt  gt  1      return k      Note that this version does not require a Debruin lookup at the end  unlike most of the other answers  It computes the position in place   Tables can be preferable though  if you call it repeatedly enough times  the risk of a cache miss becomes eclipsed by the speedup of a table   u8 kTableLog2 256      0 0 1 1 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4  5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5  6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6  6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6  7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7  7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7  7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7  7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7     u8 log2 table u32 val         u8  k   0      if  val  gt  0x0000FFFFuL    val  gt  gt   16  k    16        if  val  gt  0x000000FFuL    val  gt  gt    8  k     8        k    kTableLog2 val      precompute the Log2 of the low byte      return k      This should produce the highest throughput of any of the software answers given here  but if you only call it occasionally  prefer a table-free solution like my first snippet

User · Answer

unsigned int msb32 register unsigned int x            x     x  gt  gt  1           x     x  gt  gt  2           x     x  gt  gt  4           x     x  gt  gt  8           x     x  gt  gt  16           return x  amp    x  gt  gt  1        1 register  13 instructions  Believe it or not  this is usually faster than the BSR instruction mentioned above  which operates in linear time  This is logarithmic time   From http   aggregate org MAGIC  Most 20Significant 201 20Bit

User · Answer

What about  int highest bit unsigned int a        int count      std  frexp a   amp count       return count - 1

User · Answer

Putting this in since it s  yet another  approach  seems to be different from others already given   returns -1 if x  0  otherwise floor  log2 x     max result 31   Reduce from 32 to 4  bit problem  then use a table  Perhaps inelegant  but pragmatic   This is what I use when I don t want to use   builtin clz because of portability issues   To make it more compact  one could instead use a loop to reduce  adding 4 to r each time  max 7 iterations  Or some hybrid  such as  for 64 bits   loop to reduce to 8  test to reduce to 4   int log2floor  unsigned x       static const signed char wtab 16     -1 0 1 1  2 2 2 2  3 3 3 3 3 3 3 3      int r   0     unsigned xk   x  gt  gt  16     if  xk    0           r   16         x   xk             x is 0    0xFFFF    xk   x  gt  gt  8     if  xk    0          r    8         x   xk             x is 0    0xFF    xk   x  gt  gt  4     if  xk    0          r    4         x   xk             now x is 0  15  x 0 only if originally zero     return r   wtab x

User · Answer

Here is a fast solution for C that works in GCC and Clang  ready to be copied and pasted    include  lt limits h gt   unsigned int fls const unsigned int value        return  unsigned int 1  lt  lt    sizeof unsigned int    CHAR BIT  -   builtin clz value  - 1      unsigned long flsl const unsigned long value        return  unsigned long 1  lt  lt    sizeof unsigned long    CHAR BIT  -   builtin clzl value  - 1      unsigned long long flsll const unsigned long long value        return  unsigned long long 1  lt  lt    sizeof unsigned long long    CHAR BIT  -   builtin clzll value  - 1       And a little improved version for C      include  lt climits gt   constexpr unsigned int fls const unsigned int value        return  unsigned int 1  lt  lt    sizeof unsigned int    CHAR BIT  -   builtin clz value  - 1      constexpr unsigned long fls const unsigned long value        return  unsigned long 1  lt  lt    sizeof unsigned long    CHAR BIT  -   builtin clzl value  - 1      constexpr unsigned long long fls const unsigned long long value        return  unsigned long long 1  lt  lt    sizeof unsigned long long    CHAR BIT  -   builtin clzll value  - 1       The code assumes that value won t be 0  If you want to allow 0  you need to modify it

User · Answer

I assume your question is for an integer  called v below  and not an unsigned integer   int v   612635685     whatever value you wish  unsigned int get msb int v        int r   31                             maximum number of iteration until integer has been totally left shifted out  considering that first bit is index 0  Also we could use  sizeof int    lt  lt  3 - 1 instead of 31 to make it work on any platform       while    v  amp  0x80000000   amp  amp  r--         mask of the highest bit         v  lt  lt   1                            multiply integer by 2            return r                               will even return -1 if no bit was set  allowing error catch     If you want to make it work without taking into account the sign you can add an extra  v  lt  lt   1   before the loop  and change r value to 30 accordingly   Please let me know if I forgot anything  I haven t tested it but it should work just fine

User · Answer

thats some kind of binary search  it works with all kinds of  unsigned   integer types   include  lt climits gt   define UINT  unsigned int   define UINT BIT  CHAR BIT sizeof UINT    int msb UINT x        if 0    x          return -1       int c   0       for UINT i UINT BIT gt  gt 1  0 lt i  i gt  gt  1      if static cast lt UINT gt  x  gt  gt  i                 x  gt  gt   i          c    i             return c      to make complete    include  lt climits gt   define UINT unsigned int  define UINT BIT  CHAR BIT sizeof UINT    int lsb UINT x        if 0    x          return -1       int c   UINT BIT-1       for UINT i UINT BIT gt  gt 1  0 lt i  i gt  gt  1      if static cast lt UINT gt  x  lt  lt  i                 x  lt  lt   i          c    i             return c

User · Answer

As the answers above point out  there are a number of ways to determine the most significant bit  However  as was also pointed out  the methods are likely to be unique to either 32bit or 64bit registers  The stanford edu bithacks page provides solutions that work for both 32bit and 64bit computing  With a little work  they can be combined to provide a solid cross-architecture approach to obtaining the MSB  The solution I arrived at that compiled worked across 64  amp  32 bit computers was    if defined   LP64       defined  LP64    define BUILD 64   1  endif   include  lt stdio h gt   include  lt stdint h gt      for uint32 t        CHAR BIT   or include limits h      ifndef CHAR BIT  define CHAR BIT  8  endif     CHAR BIT            Find the log base 2 of an integer with the MSB N set in O N     operations   on 64bit  amp  32bit architectures      int getmsb  uint32 t word        int r   0      if  word  lt  1          return 0   ifdef BUILD 64     union   uint32 t u 2   double d    t      temp     t u   FLOAT WORD ORDER  LITTLE ENDIAN    0x43300000      t u   FLOAT WORD ORDER  LITTLE ENDIAN    word      t d -  4503599627370496 0      r    t u   FLOAT WORD ORDER  LITTLE ENDIAN   gt  gt  20  - 0x3FF   else     while  word  gt  gt   1                r           endif     BUILD 64        return r

User · Answer

My humble method is very simple   MSB x    INT Log x    Log 2    Translation   The MSB of x is the integer value of  Log of Base x divided by the Log of Base 2    This can easily and quickly be adapted to any programming language   Try it on your calculator to see for yourself that it works

User · Answer

Here are some  simple  benchmarks  of algorithms currently given on this page     The algorithms have not been tested over all inputs of unsigned int  so check that first  before blindly using something     On my machine clz    builtin clz  and asm work best  asm seems even faster then clz    but it might be due to the simple benchmark              go c                                    compile with   gcc go c -o go -lm  include  lt math h gt   include  lt stdio h gt   include  lt stdlib h gt   include  lt time h gt                      math                         define POS OF HIGHESTBITmath a     0th position is the Least-Signif-Bit             unsigned  log2 a              thus  do not use if a  lt   0        define NUM OF HIGHESTBITmath a    a                                       1U  lt  lt  POS OF HIGHESTBITmath a                            0                        clz                        unsigned NUM BITS U     sizeof unsigned   lt  lt  3  - 1    define POS OF HIGHESTBITclz a   NUM BITS U -   builtin clz a      only works for a    0      define NUM OF HIGHESTBITclz a    a                                           1U  lt  lt  POS OF HIGHESTBITclz a                         0                       i2f                        double FF   define POS OF HIGHESTBITi2f a   FF    double  ui 1       1  unsigned   amp FF   gt  gt 20 -1023     define NUM OF HIGHESTBITi2f a    a                                           1U  lt  lt  POS OF HIGHESTBITi2f a                         0                         asm                        unsigned OUT   define POS OF HIGHESTBITasm a     asm  bsrl  1  0      r  OUT     r  a       OUT    define NUM OF HIGHESTBITasm a    a                                           1U  lt  lt  POS OF HIGHESTBITasm a                         0                         bitshift1                         define NUM OF HIGHESTBITbitshift1 a            OUT   a                       OUT     OUT  gt  gt  1                      OUT     OUT  gt  gt  2                      OUT     OUT  gt  gt  4                      OUT     OUT  gt  gt  8                      OUT     OUT  gt  gt  16                              OUT  amp    OUT  gt  gt  1                                     bitshift2                       int POS 32     0  1  28  2  29  14  24  3               30  22  20  15  25  17  4  8  31  27  13  23  21  19               16  7  26  12  18  6  11  5  10  9     define POS OF HIGHESTBITbitshift2 a            OUT   a                       OUT    OUT  gt  gt  1                   OUT    OUT  gt  gt  2                   OUT    OUT  gt  gt  4                   OUT    OUT  gt  gt  8                   OUT    OUT  gt  gt  16                  OUT    OUT  gt  gt  1    1                          POS  OUT   0x077CB531UL   gt  gt  27     define NUM OF HIGHESTBITbitshift2 a    a                                           1U  lt  lt  POS OF HIGHESTBITbitshift2 a                              0      define LOOPS 100000000U  int main       time t start  end    unsigned ui    unsigned n                Checking the first few unsigned values  you ll need to check all if you want to use an algorithm here                    printf  math n      for  ui   0U  ui  lt  18    ui      printf   i t i n   ui  NUM OF HIGHESTBITmath ui       printf   n n       printf  clz n      for  ui   0U  ui  lt  18U    ui      printf   i t i n   ui  NUM OF HIGHESTBITclz ui       printf   n n       printf  i2f n      for  ui   0U  ui  lt  18U    ui      printf   i t i n   ui  NUM OF HIGHESTBITi2f ui       printf   n n       printf  asm n      for  ui   0U  ui  lt  18U    ui        printf   i t i n   ui  NUM OF HIGHESTBITasm ui           printf   n n       printf  bitshift1 n      for  ui   0U  ui  lt  18U    ui        printf   i t i n   ui  NUM OF HIGHESTBITbitshift1 ui           printf   n n       printf  bitshift2 n      for  ui   0U  ui  lt  18U    ui        printf   i t i n   ui  NUM OF HIGHESTBITbitshift2 ui           printf   n nPlease wait    n n                                   Simple clock   benchmark                       start   clock      for  ui   0  ui  lt  LOOPS    ui      n   NUM OF HIGHESTBITmath ui     end   clock      printf  math  t e n    double  end-start  CLOCKS PER SEC      start   clock      for  ui   0  ui  lt  LOOPS    ui      n   NUM OF HIGHESTBITclz ui     end   clock      printf  clz  t e n    double  end-start  CLOCKS PER SEC      start   clock      for  ui   0  ui  lt  LOOPS    ui      n   NUM OF HIGHESTBITi2f ui     end   clock      printf  i2f  t e n    double  end-start  CLOCKS PER SEC      start   clock      for  ui   0  ui  lt  LOOPS    ui      n   NUM OF HIGHESTBITasm ui     end   clock      printf  asm  t e n    double  end-start  CLOCKS PER SEC      start   clock      for  ui   0  ui  lt  LOOPS    ui      n   NUM OF HIGHESTBITbitshift1 ui     end   clock      printf  bitshift1  t e n    double  end-start  CLOCKS PER SEC      start   clock      for  ui   0  ui  lt  LOOPS    ui      n   NUM OF HIGHESTBITbitshift2 ui     end   clock      printf  bitshift2 t e n    double  end-start  CLOCKS PER SEC      printf   nThe lower  the better  Take note that a negative exponent is good     n       return EXIT SUCCESS

User · Answer

This is sort of like finding a kind of integer log   There are bit-twiddling tricks  but I ve made my own tool for this  The goal of course is for speed    My realization is that the CPU has an automatic bit-detector already  used for integer to float conversion  So use that   double ff  double  v 1   return     1  uint32 t    amp ff   gt  gt 20 -1023      assumes x86 endianness   This version casts the value to a double  then reads off the exponent  which tells you where the bit was  The fancy shift and subtract is to extract the proper parts from the IEEE value   It s slightly faster to use floats  but a float can only give you the first 24 bit positions because of its smaller precision     To do this safely  without undefined behaviour in C   or C  use memcpy instead of pointer casting for type-punning   Compilers know how to inline it efficiently      static assert sizeof double     2   sizeof uint32 t    double isn t 8-byte IEEE binary64       and also static assert something about FLT ENDIAN   double ff  double  v 1    uint32 t tmp  memcpy  amp tmp    const char   amp ff  sizeof uint32 t   sizeof uint32 t    return  tmp gt  gt 20 -1023    Or in C99 and later  use a union  double d  uint32 t u 2       But note that in C    union type punning is only supported on some compilers as an extension  not in ISO C       This will usually be slower than a platform-specific intrinsic for a leading-zeros counting instruction  but portable ISO C has no such function   Some CPUs also lack a leading-zero counting instruction  but some of those can efficiently convert integers to double   Type-punning an FP bit pattern back to integer can be slow  though  e g  on PowerPC it requires a store reload and usually causes a load-hit-store stall    This algorithm could potentially be useful for SIMD implementations  because fewer CPUs have SIMD lzcnt   x86 only got such an instruction with AVX512CD

User · Answer

Kaz Kylheku here   I benchmarked two approaches for this over 63 bit numbers  the long long type on gcc x86 64   staying away from the sign bit    I happen to need this  find highest bit  for something  you see    I implemented the data-driven binary search  closely based on one of the above answers   I also implemented a completely unrolled decision tree by hand  which is just code with immediate operands  No loops  no tables   The decision tree  highest bit unrolled  benchmarked to be 69  faster  except for the n   0 case for which the binary search has an explicit test   The binary-search s special test for 0 case is only 48  faster than the decision tree  which does not have a special test   Compiler  machine   GCC 4 5 2  -O3  x86-64  2867 Mhz Intel Core i5    int highest bit unrolled long long n      if  n  amp  0x7FFFFFFF00000000        if  n  amp  0x7FFF000000000000          if  n  amp  0x7F00000000000000            if  n  amp  0x7000000000000000              if  n  amp  0x4000000000000000              return 63            else             return  n  amp  0x2000000000000000    62   61            else             if  n  amp  0x0C00000000000000              return  n  amp  0x0800000000000000    60   59            else             return  n  amp  0x0200000000000000    58   57                    else           if  n  amp  0x00F0000000000000              if  n  amp  0x00C0000000000000              return  n  amp  0x0080000000000000    56   55            else             return  n  amp  0x0020000000000000    54   53            else             if  n  amp  0x000C000000000000              return  n  amp  0x0008000000000000    52   51            else             return  n  amp  0x0002000000000000    50   49                          else         if  n  amp  0x0000FF0000000000            if  n  amp  0x0000F00000000000              if  n  amp  0x0000C00000000000              return  n  amp  0x0000800000000000    48   47            else             return  n  amp  0x0000200000000000    46   45            else             if  n  amp  0x00000C0000000000              return  n  amp  0x0000080000000000    44   43            else             return  n  amp  0x0000020000000000    42   41                    else           if  n  amp  0x000000F000000000              if  n  amp  0x000000C000000000              return  n  amp  0x0000008000000000    40   39            else             return  n  amp  0x0000002000000000    38   37            else             if  n  amp  0x0000000C00000000              return  n  amp  0x0000000800000000    36   35            else             return  n  amp  0x0000000200000000    34   33                              else       if  n  amp  0x00000000FFFF0000          if  n  amp  0x00000000FF000000            if  n  amp  0x00000000F0000000              if  n  amp  0x00000000C0000000              return  n  amp  0x0000000080000000    32   31            else             return  n  amp  0x0000000020000000    30   29            else             if  n  amp  0x000000000C000000              return  n  amp  0x0000000008000000    28   27            else             return  n  amp  0x0000000002000000    26   25                    else           if  n  amp  0x0000000000F00000              if  n  amp  0x0000000000C00000              return  n  amp  0x0000000000800000    24   23            else             return  n  amp  0x0000000000200000    22   21            else             if  n  amp  0x00000000000C0000              return  n  amp  0x0000000000080000    20   19            else             return  n  amp  0x0000000000020000    18   17                          else         if  n  amp  0x000000000000FF00            if  n  amp  0x000000000000F000              if  n  amp  0x000000000000C000              return  n  amp  0x0000000000008000    16   15            else             return  n  amp  0x0000000000002000    14   13            else             if  n  amp  0x0000000000000C00              return  n  amp  0x0000000000000800    12   11            else             return  n  amp  0x0000000000000200    10   9                    else           if  n  amp  0x00000000000000F0              if  n  amp  0x00000000000000C0              return  n  amp  0x0000000000000080    8   7            else             return  n  amp  0x0000000000000020    6   5            else             if  n  amp  0x000000000000000C              return  n  amp  0x0000000000000008    4   3            else             return  n  amp  0x0000000000000002    2    n   1   0                                  int highest bit long long n      const long long mask           0x000000007FFFFFFF      0x000000000000FFFF      0x00000000000000FF      0x000000000000000F      0x0000000000000003      0x0000000000000001        int hi   64    int lo   0    int i   0     if  n    0      return 0     for  i   0  i  lt  sizeof mask   sizeof mask 0   i          int mi   lo    hi - lo    2       if   n  gt  gt  mi     0        lo   mi      else if   n  amp   mask i   lt  lt  lo      0        hi   mi         return lo   1      Quick and dirty test program    include  lt stdio h gt   include  lt time h gt   include  lt stdlib h gt   int highest bit unrolled long long n   int highest bit long long n    main int argc  char   argv      long long n   strtoull argv 1   NULL  0     int b1  b2    long i    clock t start   clock    mid  end     for  i   0  i  lt  1000000000  i        b1   highest bit unrolled n      mid   clock       for  i   0  i  lt  1000000000  i        b2   highest bit n      end   clock       printf  highest bit of 0x llx  lld    d   d n   n  n  b1  b2      printf  time1    d n    int   mid - start      printf  time2    d n    int   end - mid      return 0      Using only -O2  the difference becomes greater  The decision tree is almost four times faster   I also benchmarked against the naive bit shifting code   int highest bit shift long long n      int i   0    for    n  n  gt  gt   1  i             empty      return i      This is only fast for small numbers  as one would expect  In determining that the highest bit is 1 for n    1  it benchmarked more than 80  faster  However  half of randomly chosen numbers in the 63 bit space have the 63rd bit set   On the input 0x3FFFFFFFFFFFFFFF  the decision tree version is quite a bit faster than it is on 1  and shows to be 1120  faster  12 2 times  than the bit shifter   I will also benchmark the decision tree against the GCC builtins  and also try a mixture of inputs rather than repeating against the same number  There may be some sticking branch prediction going on and perhaps some unrealistic caching scenarios which makes it artificially faster on repetitions

User · Answer

I know this question is very old  but just having implemented an msb   function myself  I found that most solutions presented here and on other websites are not necessarily the most efficient - at least for my personal definition of efficiency  see also Update below   Here s why   Most solutions  especially those which employ some sort of binary search scheme or the na  ve approach which does a linear scan from right to left  seem to neglect the fact that for arbitrary binary numbers  there are not many which start with a very long sequence of zeros  In fact  for any bit-width  half of all integers start with a 1 and a quarter of them start with 01  See where i m getting at  My argument is that a linear scan starting from the most significant bit position to the least significant  left to right  is not so  linear  as it might look like at first glance   It can be shown1  that for any bit-width  the average number of bits that need to be tested is at most 2  This translates to an amortized time complexity of O 1  with respect to the number of bits       Of course  the worst case is still O n   worse than the O log n   you get with binary-search-like approaches  but since there are so few worst cases  they are negligible for most applications  Update  not quite  There may be few  but they might occur with high probability - see Update below    Here is the  na  ve  approach i ve come up with  which at least on my machine beats most other approaches  binary search schemes for 32-bit ints always require log2 32    5 steps  whereas this silly algorithm requires less than 2 on average  - sorry for this being C   and not pure C   template  lt typename T gt  auto msb T n  - gt  int       static assert std  is integral lt T gt   value  amp  amp   std  is signed lt T gt   value           msb lt T gt     T must be an unsigned integral type          for  T i   std  numeric limits lt T gt   digits - 1  mask   1  lt  lt  i  i  gt   0  --i  mask  gt  gt   1                if   n  amp  mask     0              return i             return 0      Update  While what i wrote here is perfectly true for arbitrary integers  where every combination of bits is equally probable  my speed test simply measured how long it took to determine the MSB for all 32-bit integers   real-life integers  for which such a function will be called  usually follow a different pattern  In my code  for example  this function is used to determine whether an object size is a power of 2  or to find the next power of 2 greater or equal than an object size  My guess is that most applications using the MSB involve numbers which are much smaller than the maximum number an integer can represent  object sizes rarely utilize all the bits in a size t   In this case  my solution will actually perform worse than a binary search approach - so the latter should probably be preferred  even though my solution will be faster looping through all integers  TL DR  Real-life integers will probably have a bias towards the worst case of this simple algorithm  which will make it perform worse in the end - despite the fact that it s amortized O 1  for truly arbitrary integers   1The argument goes like this  rough draft   Let n be the number of bits  bit-width   There are a total of 2n integers wich can be represented with n bits  There are 2n - 1 integers starting with a 1  first 1 is fixed  remaining n - 1 bits can be anything   Those integers require only one interation of the loop to determine the MSB  Further  There are 2n - 2 integers starting with 01  requiring 2 iterations  2n - 3 integers starting with 001  requiring 3 iterations  and so on   If we sum up all the required iterations for all possible integers and divide them by 2n  the total number of integers  we get the average number of iterations needed for determining the MSB for n-bit integers    1   2n - 1   2   2n - 2   3   2n - 3         n    2n  This series of average iterations is actually convergent and has a limit of 2 for n towards infinity  Thus  the na  ve left-to-right algorithm has actually an amortized constant time complexity of O 1  for any number of bits

User · Answer

Woaw  that was many answers  I am not sorry for answering on an old question   int result   0   could be a char or int8 t instead if value    this assumes the value is 64bit     if 0xFFFFFFFF00000000 amp value    value gt  gt   1 lt  lt 5   result   1 lt  lt 5       if it is 32bit then remove this line     if 0x00000000FFFF0000 amp value    value gt  gt   1 lt  lt 4   result   1 lt  lt 4       and remove the 32msb     if 0x000000000000FF00 amp value    value gt  gt   1 lt  lt 3   result   1 lt  lt 3          if 0x00000000000000F0 amp value    value gt  gt   1 lt  lt 2   result   1 lt  lt 2          if 0x000000000000000C amp value    value gt  gt   1 lt  lt 1   result   1 lt  lt 1          if 0x0000000000000002 amp value    result   1 lt  lt 0       else    result -1      This answer is pretty similar to another answer    oh well

User · Answer

Since 2 N is an integer with only the Nth bit set  1  lt  lt  N   finding the position  N  of the highest set bit is the integer log base 2 of that integer   http   graphics stanford edu  seander bithacks html IntegerLogObvious  unsigned int v  unsigned r   0   while  v  gt  gt   1        r        This  obvious  algorithm may not be transparent to everyone  but when you realize that the code shifts right by one bit repeatedly until the leftmost bit has been shifted off  note that C treats any non-zero value as true  and returns the number of shifts  it makes perfect sense  It also means that it works even when more than one bit is set     the result is always for the most significant bit   If you scroll down on that page  there are faster  more complex variations  However  if you know you re dealing with numbers with a lot of leading zeroes  the naive approach may provide acceptable speed  since bit shifting is rather fast in C  and the simple algorithm doesn t require indexing an array   NOTE  When using 64-bit values  be extremely cautious about using extra-clever algorithms  many of them only work correctly for 32-bit values

User · Answer

I had a need for a routine to do this and before searching the web  and finding this page  I came up with my own solution basedon a binary search  Although I m sure someone has done this before  It runs in constant time and can be faster than the  obvious  solution posted  although I m not making any great claims  just posting it for interest   int highest bit unsigned int a      static const unsigned int maskv       0xffff  0xff  0xf  0x3  0x1      const unsigned int  mask   maskv    int l  h     if  a    0  return -1     l   0    h   32     do       int m   l    h - l    2       if   a  gt  gt  m     0  l   m      else if   a  amp    mask  lt  lt  l      0  h   m       mask        while  l  lt  h - 1      return l

User · Answer

Assuming you re on x86 and game for a bit of inline assembler  Intel provides a BSR instruction   bit scan reverse     It s fast on some x86s  microcoded on others    From the manual      Searches the source operand for the most significant set   bit  1 bit   If a most significant 1   bit is found  its bit index is stored   in the destination operand  The source operand can be a   register or a memory location  the   destination operand is a register  The   bit index is an unsigned offset from   bit 0 of the source operand  If the   content source operand is 0  the   content of the destination operand is   undefined     If you re on PowerPC there s a similar cntlz   count leading zeros   instruction    Example code for gcc    include  lt iostream gt   int main  int char        int n 1    for      n        int msb      asm  bsrl  1  0      r  msb     r  n        std  cout  lt  lt  n  lt  lt         lt  lt  msb  lt  lt  std  endl        return 0      See also this inline assembler tutorial  which shows  section 9 4  it being considerably faster than looping code

User · Answer

Think bitwise operators   I missunderstood the question the first time  You should produce an int with the leftmost bit set  the others zero   Assuming cmp is set to that value   position   sizeof int  8 while   n  amp  cmp        n  lt  lt  1     position--

[c] What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?

Examples related to c

Examples related to algorithm

Examples related to optimization

Examples related to bit-manipulation