How to count the number of set bits in a 32-bit integer

Question

8 bits representing the number 7 look like this   00000111   Three bits are set      What are algorithms to determine the number of set bits in a 32-bit integer

User · Answer

If you happen to be using Java  the built-in method Integer bitCount will do that

User · Answer

The function you are looking for is often called the  sideways sum  or  population count  of a binary number   Knuth discusses it in pre-Fascicle 1A  pp11-12  although there was a brief reference in Volume 2  4 6 3- 7     The locus classicus is Peter Wegner s article  A Technique for Counting Ones in a Binary Computer   from the Communications of the ACM  Volume 3  1960  Number 5  page 322   He gives two different algorithms there  one optimized for numbers expected to be  sparse   i e   have a small number of ones  and one for the opposite case

User · Answer

This will also work fine    int ans   0  while num    ans     num  amp 1    num   num  gt  gt 1        return ans

User · Answer

Here is a solution that has not been mentioned so far  using bitfields  The following program counts the set bits in an array of 100000000 16-bit integers using 4 different methods  Timing results are given in parentheses  on MacOSX  with gcc -O3     include  lt stdio h gt   include  lt stdlib h gt    define LENGTH 100000000  typedef struct       unsigned char bit0   1      unsigned char bit1   1      unsigned char bit2   1      unsigned char bit3   1      unsigned char bit4   1      unsigned char bit5   1      unsigned char bit6   1      unsigned char bit7   1    bits   unsigned char sum bits const unsigned char x        const bits  b    const bits    amp x      return b- gt bit0   b- gt bit1   b- gt bit2   b- gt bit3              b- gt bit4   b- gt bit5   b- gt bit6   b- gt bit7     int NumberOfSetBits int i        i   i -   i  gt  gt  1   amp  0x55555555       i    i  amp  0x33333333      i  gt  gt  2   amp  0x33333333       return    i    i  gt  gt  4    amp  0x0F0F0F0F    0x01010101   gt  gt  24      define out s        printf  bits set   lu nbits counted   lu n   8 LENGTH sizeof short  3 4  s    int main int argc  char   argv        unsigned long i  s      unsigned short  x   malloc LENGTH sizeof short        unsigned char lut 65536    p      unsigned short  ps      int  pi          set 3 4 of the bits        for  i 0  i lt LENGTH    i          x i    0xFFF0          sum bits  1 772s         for  i LENGTH sizeof short   p  unsigned char   x  s 0  i--  s  sum bits  p          out s           NumberOfSetBits  0 404s         for  i LENGTH sizeof short  sizeof int   pi  int  x  s 0  i--  s  NumberOfSetBits  pi          out s           populate lookup table        for  i 0  p  unsigned char    amp i  i lt sizeof lut     i          lut i    sum bits p 0     sum bits p 1            256-bytes lookup table  0 317s         for  i LENGTH sizeof short   p  unsigned char   x  s 0  i--  s  lut  p          out s           65536-bytes lookup table  0 250s         for  i LENGTH  ps x  s 0  i--  s  lut  ps          out s        free x       return 0      While the bitfield version is very readable  the timing results show that it is over 4x slower than NumberOfSetBits    The lookup-table based implementations are still quite a bit faster  in particular with a 65 kB table

User · Answer

A simple way which should work nicely for a small amount of bits it something like this  For 4 bits in this example     i  amp  1     i  amp  2  2    i  amp  4  4    i  amp  8  8  Would others recommend this for a small number of bits as a simple solution

User · Answer

Simple algorithm to count the number of set bits   int countbits n        int count   0       while n    0           n   n  amp   n-1           count            return count      Take the example of 11  1011  and try manually running through the algorithm  Should help you a lot

User · Answer

How about converting the integer to a binary string and count the ones   php solution   substr count  decbin  integer    1

User · Answer

I think the fastest way   without using lookup tables and popcount   is the following  It counts the set bits with just 12 operations   int popcount int v        v   v -   v  gt  gt  1   amp  0x55555555                     put count of each 2 bits into those 2 bits     v    v  amp  0x33333333      v  gt  gt  2   amp  0x33333333      put count of each 4 bits into those 4 bits       return c     v    v  gt  gt  4   amp  0xF0F0F0F    0x1010101   gt  gt  24      It works because you can count the total number of set bits by dividing in two halves  counting the number of set bits in both halves and then adding them up  Also know as Divide and Conquer paradigm  Let s get into detail     v   v -   v  gt  gt  1   amp  0x55555555      The number of bits in two bits can be 0b00  0b01 or 0b10  Lets try to work this out on 2 bits      ---------------------------------------------      v         v  gt  gt  1   amp  0b0101      v - x      ---------------------------------------------    0b00           0b00               0b00       0b01           0b00               0b01         0b10           0b01               0b01    0b11           0b01               0b10   This is what was required  the last column shows the count of set bits in every two bit pair  If the two bit number is  gt   2  0b10  then and produces 0b01  else it produces 0b00    v    v  amp  0x33333333      v  gt  gt  2   amp  0x33333333      This statement should be easy to understand  After the first operation we have the count of set bits in every two bits  now we sum up that count in every 4 bits   v  amp  0b00110011           masks out even two bits  v  gt  gt  2   amp  0b00110011     masks out odd two bits   We then sum up the above result  giving us the total count of set bits in 4 bits  The last statement is the most tricky   c     v    v  gt  gt  4   amp  0xF0F0F0F    0x1010101   gt  gt  24    Let s break it down further      v    v  gt  gt  4    It s similar to the second statement  we are counting the set bits in groups of 4 instead  We know   because of our previous operations   that every nibble has the count of set bits in it  Let s look an example  Suppose we have the byte 0b01000010  It means the first nibble has its 4bits set and the second one has its 2bits set  Now we add those nibbles together    0b01000010   0b01000000   It gives us the count of set bits in a byte  in the first nibble 0b01100010 and therefore we mask the last four bytes of all the bytes in the number  discarding them    0b01100010  amp  0xF0   0b01100000   Now every byte has the count of set bits in it  We need to add them up all together  The trick is to multiply the result by 0b10101010 which has an interesting property  If our number has four bytes  A B C D  it will result in a new number with these bytes A B C D B C D C D D  A 4 byte number can have maximum of 32 bits set  which can be represented as 0b00100000   All we need now is the first byte which has the sum of all set bits in all the bytes  and we get it by   gt  gt  24  This algorithm was designed for 32 bit words but can be easily modified for 64 bit words

User · Answer

You can use built in function named   builtin popcount     There is no  builtin popcount in C   but it is a built in function of GCC compiler  This function return the number of set bit in an integer    int   builtin popcount  unsigned int x     Reference   Bit Twiddling Hacks

User · Answer

You can do something like   int countSetBits int n        n   n amp 0xAAAAAAAA  gt  gt 1     n amp 0x55555555       n   n amp 0xCCCCCCCC  gt  gt 2     n amp 0x33333333       n   n amp 0xF0F0F0F0  gt  gt 4     n amp 0x0F0F0F0F       n   n amp 0xFF00FF00  gt  gt 8     n amp 0x00FF00FF       return n     int main         int n 10      printf  Number of set bits   d  countSetBits n         return 0      See heer  http   ideone com JhwcX  The working can be explained as follows   First  all the even bits are shifted towards right  amp  added with the odd bits to count the number of bits in group of two  Then we work in group of two  then four  amp  so on

User · Answer

In my opinion  the  best  solution is the one that can be read by another programmer  or the original programmer two years later  without copious comments   You may well want the fastest or cleverest solution which some have already provided but I prefer readability over cleverness any time   unsigned int bitCount  unsigned int value        unsigned int count   0      while  value  gt  0                 until all bits are zero         if   value  amp  1     1         check lower bit             count            value  gt  gt   1                  shift bits  removing lower bit           return count      If you want more speed  and assuming you document it well to help out your successors   you could use a table lookup      Lookup table for fast calculation of bits set in 8-bit unsigned char   static unsigned char oneBitsInUChar           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F   lt - n                                                                0  1  1  2  1  2  2  3  1  2  2  3  2  3  3  4     0n     1  2  2  3  2  3  3  4  2  3  3  4  3  4  4  5     1n               4  5  5  6  5  6  6  7  5  6  6  7  6  7  7  8     Fn        Function for fast calculation of bits set in 16-bit unsigned short   unsigned char oneBitsInUShort  unsigned short x        return oneBitsInUChar  x  gt  gt     8             oneBitsInUChar  x  amp   0xff         Function for fast calculation of bits set in 32-bit unsigned int   unsigned char oneBitsInUInt  unsigned int x        return oneBitsInUShort  x  gt  gt      16             oneBitsInUShort  x  amp   0xffff       Although these rely on specific data type sizes so they re not that portable  But  since many performance optimisations aren t portable anyway  that may not be an issue  If you want portability  I d stick to the readable solution

User · Answer

There are many algorithm to count the set bits  but i think the best one is the faster one  You can see the detailed on this page   Bit Twiddling Hacks   I suggest this one   Counting bits set in 14  24  or 32-bit words using 64-bit instructions  unsigned int v     count the number of bits set in v unsigned int c     c accumulates the total bits set in v     option 1  for at most 14-bit values in v  c    v   0x200040008001ULL  amp  0x111111111111111ULL    0xf      option 2  for at most 24-bit values in v  c      v  amp  0xfff    0x1001001001001ULL  amp  0x84210842108421ULL    0x1f  c       v  amp  0xfff000   gt  gt  12    0x1001001001001ULL  amp  0x84210842108421ULL          0x1f      option 3  for at most 32-bit values in v  c      v  amp  0xfff    0x1001001001001ULL  amp  0x84210842108421ULL    0x1f  c       v  amp  0xfff000   gt  gt  12    0x1001001001001ULL  amp  0x84210842108421ULL          0x1f  c      v  gt  gt  24    0x1001001001001ULL  amp  0x84210842108421ULL    0x1f    This method requires a 64-bit CPU with fast modulus division to be efficient  The first option takes only 3 operations  the second option takes 10  and the third option takes 15

User · Answer

I use the below code which is more intuitive   int countSetBits int n        return  n   0   1   countSetBits n  amp   n-1        Logic   n  amp   n-1   resets the last set bit of n   P S   I know this is not O 1  solution  albeit an interesting solution

User · Answer

This is known as the  Hamming Weight    popcount  or  sideways addition   The  best  algorithm really depends on which CPU you are on and what your usage pattern is  Some CPUs have a single built-in instruction to do it and others have parallel instructions which act on bit vectors  The parallel instructions  like x86 s popcnt  on CPUs where it s supported  will almost certainly be fastest   Some other architectures may have a slow instruction implemented with a microcoded loop that tests a bit per cycle  citation needed   A pre-populated table lookup method can be very fast if your CPU has a large cache and or you are doing lots of these instructions in a tight loop  However it can suffer because of the expense of a  cache miss   where the CPU has to fetch some of the table from main memory    Look up each byte separately to keep the table small   If you know that your bytes will be mostly 0 s or mostly 1 s then there are very efficient algorithms for these scenarios  I believe a very good general purpose algorithm is the following  known as  parallel  or  variable-precision SWAR algorithm   I have expressed this in a C-like pseudo language  you may need to adjust it to work for a particular language  e g  using uint32 t for C   and  gt  gt  gt  in Java   int numberOfSetBits uint32 t i            Java  use int  and use  gt  gt  gt  instead of  gt  gt          C or C    use uint32 t      i   i -   i  gt  gt  1   amp  0x55555555        i    i  amp  0x33333333      i  gt  gt  2   amp  0x33333333        return    i    i  gt  gt  4    amp  0x0F0F0F0F    0x01010101   gt  gt  24     For JavaScript  coerce to integer with  0 for performance  change the first line to i    i 0  -   i  gt  gt  1   amp  0x55555555   This has the best worst-case behaviour of any of the algorithms discussed  so will efficiently deal with any usage pattern or values you throw at it  References   https   graphics stanford edu  seander bithacks html https   en wikipedia org wiki Hamming weight http   gurmeet net puzzles fast-bit-counting-routines  http   aggregate ee engr uky edu MAGIC  Population 20Count 20 Ones 20Count    How this SWAR bithack works  i   i -   i  gt  gt  1   amp  0x55555555    The first step is an optimized version of masking to isolate the odd   even bits  shifting to line them up  and adding   This effectively does 16 separate additions in 2-bit accumulators  SWAR   SIMD Within A Register    Like  i  amp  0x55555555      i gt  gt 1   amp  0x55555555   The next step takes the odd even eight of those 16x 2-bit accumulators and adds again  producing 8x 4-bit sums   The i -     optimization isn t possible this time so it does just mask before   after shifting   Using the same 0x33    constant both times instead of 0xccc    before shifting is a good thing when compiling for ISAs that need to construct 32-bit constants in registers separately  The final shift-and-add step of  i    i  gt  gt  4    amp  0x0F0F0F0F widens to 4x 8-bit accumulators   It masks after adding instead of before  because the maximum value in any 4-bit accumulator is 4  if all 4 bits of the corresponding input bits were set   4 4   8 which still fits in 4 bits  so carry between nibble elements is impossible in i    i  gt  gt  4   So far this is just fairly normal SIMD using SWAR techniques with a few clever optimizations   Continuing on with the same pattern for 2 more steps can widen to 2x 16-bit then 1x 32-bit counts   But there is a more efficient way on machines with fast hardware multiply  Once we have few enough  quot elements quot   a multiply with a magic constant can sum all the elements into the top element   In this case byte elements   Multiply is done by left-shifting and adding  so a multiply of x   0x01010101 results in x    x lt  lt 8     x lt  lt 16     x lt  lt 24    Our 8-bit elements are wide enough  and holding small enough counts  that this doesn t produce carry into that top 8 bits  A 64-bit version of this can do 8x 8-bit elements in a 64-bit integer with a 0x0101010101010101 multiplier  and extract the high byte with  gt  gt 56   So it doesn t take any extra steps  just wider constants   This is what GCC uses for   builtin popcountll on x86 systems when the hardware popcnt instruction isn t enabled   If you can use builtins or intrinsics for this  do so to give the compiler a chance to do target-specific optimizations   With full SIMD for wider vectors  e g  counting a whole array  This bitwise-SWAR algorithm could parallelize to be done in multiple vector elements at once  instead of in a single integer register  for a speedup on CPUs with SIMD but no usable popcount instruction    e g  x86-64 code that has to run on any CPU  not just Nehalem or later   However  the best way to use vector instructions for popcount is usually by using a variable-shuffle to do a table-lookup for 4 bits at a time of each byte in parallel    The 4 bits index a 16 entry table held in a vector register   On Intel CPUs  the hardware 64bit popcnt instruction can outperform an SSSE3 PSHUFB bit-parallel implementation by about a factor of 2  but only if your compiler gets it just right   Otherwise SSE can come out significantly ahead   Newer compiler versions are aware of the popcnt false dependency problem on Intel   https   github com WojciechMula sse-popcount state-of-the-art x86 SIMD popcount for SSSE3  AVX2  AVX512BW  AVX512VBMI  or AVX512 VPOPCNT   Using Harley-Seal across vectors to defer popcount within an element    Also ARM NEON  Counting 1 bits  population count  on large data using AVX-512 or AVX-2 related  https   github com mklarqvist positional-popcount - separate counts for each bit-position of multiple 8  16  32  or 64-bit integers    Again  x86 SIMD including AVX-512 which is really good at this  with vpternlogd making Harley-Seal very good

User · Answer

I am giving two algorithms to answer the question     package countSetBitsInAnInteger       import java util Scanner       public class UsingLoop        public static void main String   args            Scanner in   new Scanner System in           try          System out println  Enter a integer number to check for set bits in it            int n   in nextInt            System out println  Using while loop  we get the number of set bits as    usingLoop n            System out println  Using Brain Kernighan s Algorithm  we get the number of set bits as    usingBrainKernighan n            System out println  Using                       finally          in close                        private static int usingBrainKernighan int n            int count   0          while n gt 0               n amp   n-1               count                      return count              Analysis          Time complexity   O lgn          Space complexity   O 1             private static int usingLoop int n            int count   0          for int i 0 i lt 32 i                 if  n amp  1 lt  lt i    0                  count                      return count                   Analysis          Time Complexity   O 32     Maybe the complexity is O lgn          Space Complexity   O 1

User · Answer

From Python 3 10 onwards  you will be able to use the int bit count   function  but for the time being  you can define this function yourself  def bit count integer       return bin integer  count  quot 1 quot

User · Answer

What do you means with  Best algorithm   The shorted code or the fasted code  Your code look very elegant and it has a constant execution time  The code is also very short   But if the speed is the major factor and not the code size then I think the follow can be faster          static final int   BIT COUNT     0  1  1      256 values with a bitsize of a byte                static int bitCountOfByte  int value                return BIT COUNT  value  amp  0xFF                       static int bitCountOfInt  int value                return bitCountOfByte  value                       bitCountOfByte  value  gt  gt  8                       bitCountOfByte  value  gt  gt  16                       bitCountOfByte  value  gt  gt  24                I think that this will not more faster for a 64 bit value but a 32 bit value can be faster

User · Answer

Personally I use this      public static int myBitCount long L         int count   0        while  L    0             count             L    L  amp  -L                 return count

User · Answer

I think the Brian Kernighan s method will be useful too    It goes through as many iterations as there are set bits  So if we have a 32-bit word with only the high bit set  then it will only go once through the loop     int countSetBits unsigned int n         unsigned int n     count the number of bits set in n     unsigned int c     c accumulates the total bits set in n     for  c 0 n gt 0 n n amp  n-1   c         return c          Published in 1988  the C Programming Language 2nd Ed   by Brian W  Kernighan and Dennis M  Ritchie  mentions this in exercise 2-9  On April 19  2006 Don Knuth pointed out to me that this method  was first published by Peter Wegner in CACM 3  1960   322   Also discovered independently by Derrick Lehmer and published in 1964 in a book edited by Beckenbach

User · Answer

I wrote a fast bitcount macro for RISC machines in about 1990   It does not use advanced arithmetic  multiplication  division      memory fetches  way too slow   branches  way too slow   but it does assume the CPU has a 32-bit barrel shifter  in other words     1 and    32 take the same amount of cycles    It assumes that small constants  such as 6  12  24  cost nothing to load into the registers  or are stored in temporaries and reused over and over again   With these assumptions  it counts 32 bits in about 16 cycles instructions on most RISC machines   Note that 15 instructions cycles is close to a lower bound on the number of cycles or instructions  because it seems to take at least 3 instructions  mask  shift  operator  to cut the number of addends in half  so log 2 32    5  5 x 3   15 instructions is a quasi-lowerbound    define BitCount X Y                              Y   X -   X  gt  gt  1   amp  033333333333  -   X  gt  gt  2   amp  011111111111                     Y     Y    Y  gt  gt  3    amp  030707070707                     Y     Y    Y  gt  gt  6                      Y    Y    Y  gt  gt  12     Y  gt  gt  24    amp  077    Here is a secret to the first and most complex step   input output AB    CD             Note 00    00               AB 01    01               AB 10    01               AB -  A  gt  gt  1   amp  0x1 11    10               AB -  A  gt  gt  1   amp  0x1   so if I take the 1st column  A  above  shift it right 1 bit  and subtract it from AB  I get the output  CD    The extension to 3 bits is similar  you can check it with an 8-row boolean table like mine above if you wish    Don Gillies

User · Answer

unsigned int count bit unsigned int x      x    x  amp  0x55555555      x  gt  gt  1   amp  0x55555555     x    x  amp  0x33333333      x  gt  gt  2   amp  0x33333333     x    x  amp  0x0F0F0F0F      x  gt  gt  4   amp  0x0F0F0F0F     x    x  amp  0x00FF00FF      x  gt  gt  8   amp  0x00FF00FF     x    x  amp  0x0000FFFF      x  gt  gt  16  amp  0x0000FFFF     return x      Let me explain this algorithm   This algorithm is based on Divide and Conquer Algorithm  Suppose there is a 8bit integer 213 11010101 in binary   the algorithm works like this each time merge two neighbor blocks     -------------------------------    1   1   0   1   0   1   0   1     lt - x    1 0     0 1     0 1     0 1      lt - first time merge      0 0 1 1         0 0 1 0        lt - second time merge          0 0 0 0 0 1 0 1            lt - third time   answer   00000101   5   -------------------------------

User · Answer

Why not iteratively divide by 2    count   0 while n   0   if  n   2     1     count    1   n    2     I agree that this isn t the fastest  but  best  is somewhat ambiguous  I d argue though that  best  should have an element of clarity

User · Answer

I m particularly fond of this example from the fortune file     define BITCOUNT x        BX  x   BX  x   4     0x0F0F0F0F    255   define BX  x            x  -    x   1  0x77777777                               -    x   2  0x33333333                               -    x   3  0x11111111     I like it best because it s so pretty

User · Answer

public class BinaryCounter    private int N   public BinaryCounter int N        this N   N     public static void main String   args         BinaryCounter counter new BinaryCounter 7            System out println  Number of ones is    counter count         public int count        if N lt  0  return 0      int counter 0      int K   0      do          K   biggestPowerOfTwoSmallerThan N           N   N-K          counter         while  N    0       return counter      private int biggestPowerOfTwoSmallerThan int N        if N  1  return 1      for int i 0 i lt N i             if Math pow 2  i   gt  N               int power   i-1              return  int  Math pow 2  power                       return 0

User · Answer

From Hacker s Delight  p  66  Figure 5-2  int pop unsigned x        x   x -   x  gt  gt  1   amp  0x55555555       x    x  amp  0x33333333      x  gt  gt  2   amp  0x33333333       x    x    x  gt  gt  4    amp  0x0F0F0F0F      x   x    x  gt  gt  8       x   x    x  gt  gt  16       return x  amp  0x0000003F      Executes in  20-ish instructions  arch dependent   no branching Hacker s Delight is delightful  Highly recommended

User · Answer

if you re using C   another option is to use template metaprogramming      recursive template to sum bits in an int template  lt int BITS gt  int countBits int val               return the least significant bit plus the result of calling ourselves with               the shifted value         return  val  amp  0x1    countBits lt BITS-1 gt  val  gt  gt  1         template specialisation to terminate the recursion when there s only one bit left template lt  gt  int countBits lt 1 gt  int val            return val  amp  0x1      usage would be      to count bits in a byte char  this returns 8  countBits lt 8 gt   255       another byte  this returns 7  countBits lt 8 gt   254       counting bits in a word short  this returns 1  countBits lt 16 gt   256     you could of course further expand this template to use different types  even auto-detecting bit size  but I ve kept it simple for clarity   edit  forgot to mention this is good because it should work in any C   compiler and it basically just unrolls your loop for you if a constant value is used for the bit count  in other words  I m pretty sure it s the fastest general method you ll find

User · Answer

For a happy medium between a 232 lookup table and iterating through each bit individually   int bitcount unsigned int num       int count   0      static int nibblebits              0  1  1  2  1  2  2  3  1  2  2  3  2  3  3  4       for   num    0  num  gt  gt   4          count    nibblebits num  amp  0x0f       return count      From http   ctips pbwiki com CountBits

User · Answer

Also consider the built-in functions of your compilers  On the GNU compiler for example you can just use   int   builtin popcount  unsigned int x   int   builtin popcountll  unsigned long long x    In the worst case the compiler will generate a call to a function  which in current GCC uses a shift and bit-hack  at least for x86   In the best case the compiler will emit a cpu instruction to do the job    Just like a   or   operator - GCC will use a hardware multiply or divide instruction if available  otherwise will call a libgcc helper function   The GCC builtins even work across multiple platforms  Popcount will become mainstream in the x86 architecture  so it makes sense to start using the builtin now so you can recompile to let it inline a hardware instruction  Other architectures have had popcount for years   On x86  you can tell the compiler that it can assume support for popcnt instruction with -mpopcnt  also implied by -msse4 2    See GCC x86 options   -march nehalem  or -march  whatever CPU you want your code to assume and to tune for  could be a good choice    Running the resulting binary on an older CPU will result in an illegal-instruction fault  To make binaries optimized for the machine you build them on  use -march native   with gcc  clang  or ICC   MSVC provides an intrinsic for the x86 popcnt instruction  but unlike gcc it s really an intrinsic for the hardware instruction and requires hardware support   Using std  bitset lt  gt   count   instead of a built-in In theory  any compiler that knows how to popcount efficiently for the target CPU should expose that functionality through ISO C   std  bitset lt  gt    In practice  you might be better off with the bit-hack AND shift ADD in some cases for some target CPUs  For target architectures where hardware popcount is an optional extension  like x86   not all compilers have a std  bitset that takes advantage of it when available   For example  MSVC has no way to enable popcnt support at compile time  and always uses a table lookup  even with  Ox  arch AVX  which implies SSE4 2  although technically there is a separate feature bit for popcnt   But at least you get something portable that works everywhere  and with gcc clang with the right target options  you get hardware popcount for architectures that support it   include  lt bitset gt   include  lt limits gt   include  lt type traits gt   template lt typename T gt    static inline     static if you want to compile with -mpopcnt in one compilation unit but not others typename std  enable if lt std  is integral lt T gt   value   unsigned  gt   type  popcount T x        static assert std  numeric limits lt T gt   radix    2   quot non-binary type quot            sizeof x  CHAR BIT     constexpr int bitwidth   std  numeric limits lt T gt   digits   std  numeric limits lt T gt   is signed         std  bitset constructor was only unsigned long before C  11   Beware if porting to C  03     static assert bitwidth  lt   std  numeric limits lt unsigned long long gt   digits   quot arg too wide for std  bitset   constructor quot         typedef typename std  make unsigned lt T gt   type UT            probably not needed  bitset width chops after sign-extension      std  bitset lt bitwidth gt  bs  static cast lt UT gt  x         return bs count       See asm from gcc  clang  icc  and MSVC on the Godbolt compiler explorer  x86-64 gcc -O3 -std gnu  11 -mpopcnt emits this  unsigned test short short a    return popcount a         movzx   eax  di        note zero-extension  not sign-extension     popcnt  rax  rax     ret unsigned test int int a    return popcount a         mov     eax  edi     popcnt  rax  rax     ret unsigned test u64 unsigned long long a    return popcount a         xor     eax  eax       gcc avoids false dependencies for Intel CPUs     popcnt  rax  rdi     ret  PowerPC64 gcc -O3 -std gnu  11 emits  for the int arg version       rldicl 3 3 0 32       zero-extend from 32 to 64-bit     popcntd 3 3           popcount     blr  This source isn t x86-specific or GNU-specific at all  but only compiles well for x86 with gcc clang icc  Also note that gcc s fallback for architectures without single-instruction popcount is a byte-at-a-time table lookup   This isn t wonderful for ARM  for example

User · Answer

user local bin perl        c 0x11BBBBAB        count 0        m 0x00000001      for  i 0  i lt 32  i                   f  c  amp   m          if  f    1                         count                       c  c  gt  gt  1            printf   d   count    ive done it through a perl script  the number taken is  c 0x11BBBBAB    B 3 1s    A 2 1s    so in total   1 1 3 3 3 2 3 3 19

User · Answer

private int get bits set int v              int c     c accumulates the total bits set in v         for  c   0  v gt 0  c                          v  amp   v - 1     clear the least significant bit set                   return c

User · Answer

Another Hamming weight algorithm if you re on a BMI2 capable CPU  the weight   tzcnt u64   pext u64 data i  data i       Have fun

User · Answer

Java JDK1 5  Integer bitCount n    where n is the number whose 1 s are to be counted   check also   Integer highestOneBit n   Integer lowestOneBit n   Integer numberOfLeadingZeros n   Integer numberOfTrailingZeros n      Beginning with the value 1  rotate left 16 times      n   1           for  int i   0  i  lt  16  i                  n   Integer rotateLeft n  1               System out println n

User · Answer

Here is a portable module   ANSI-C   which can benchmark each of your algorithms on any architecture     Your CPU has 9 bit bytes   No problem  -   At the moment it implements 2 algorithms  the K amp R algorithm and a byte wise lookup table   The lookup table is on average 3 times faster than the K amp R algorithm   If someone can figure a way to make the  Hacker s Delight  algorithm portable feel free to add it in    ifndef  BITCOUNT H   define  BITCOUNT H      Return the Hamming Wieght of val  i e  the number of  on  bits     int bitcount  unsigned int        List of available bitcount algorithms       onTheFly     Calculate the bitcount on demand        lookupTalbe  Uses a small lookup table to determine the bitcount   This    method is on average 3 times as fast as onTheFly  but incurs a small    upfront cost to initialize the lookup table on the first call        strategyCount is just a placeholder       enum strategy   onTheFly  lookupTable  strategyCount        String represenations of the algorithm names    extern const char  strategyNames        Choose which bitcount algorithm to use     void setStrategy  enum strategy      endif       include  lt limits h gt    include  bitcount h      The number of entries needed in the table is equal to the number of unique    values a char can represent which is always UCHAR MAX   1   static unsigned char  bitCountTable UCHAR MAX   1   static unsigned int  lookupTableInitialized   0   static int  defaultBitCount  unsigned int val         int count          Starting with         1100 - 1    1011   1100  amp  1011    1000        1000 - 1    0111   1000  amp  0111    0000             for   count   0  val    count           val  amp   val - 1       return count        Looks up each byte of the integer in a lookup table        The first time the function is called it initializes the lookup table      static int  tableBitCount  unsigned int val         int bCount   0       if     lookupTableInitialized             unsigned int i          for   i   0  i    UCHAR MAX   1    i                bitCountTable i                      unsigned char   defaultBitCount  i              lookupTableInitialized   1             for     val  val  gt  gt   CHAR BIT           bCount     bitCountTable val  amp  UCHAR MAX        return bCount     static int     bitcount     unsigned int      defaultBitCount   const char  strategyNames        onTheFly    lookupTable      void setStrategy  enum strategy s         switch   s         case onTheFly           bitcount    defaultBitCount          break      case lookupTable           bitcount    tableBitCount          break      case strategyCount          break              Just a forwarding function which will call whichever version of the    algorithm has been selected by the client      int bitcount  unsigned int val         return  bitcount  val        ifdef  BITCOUNT EXE    include  lt stdio h gt   include  lt stdlib h gt   include  lt time h gt      Use the same sequence of pseudo random numbers to benmark each Hamming    Weight algorithm      void benchmark  int reps         clock t start  stop      int i  j      static const int iterations   1000000       for   j   0  j    strategyCount    j             setStrategy  j             srand  257             start   clock               for   i   0  i    reps   iterations    i               bitcount  rand                 stop   clock               printf                 n t d psudoe-random integers using  s   f seconds n n                 reps   iterations  strategyNames j                   double    stop - start     CLOCKS PER SEC             int main  void         int option       while   1             printf   Menu Options n                t1  tPrint the Hamming Weight of an Integer n                t2  tBenchmark Hamming Weight implementations n                t3  tExit   or cntl-d   n n t              if   scanf    d    amp option      EOF               break           switch   option             case 1              printf   Please enter the integer                   if   scanf    d    amp option      EOF                   printf                        The Hamming Weight of  d   0x X   is  d n n                         option  option  bitcount  option                  break          case 2              printf                    Please select number of reps   in millions                     if   scanf    d    amp option      EOF                   benchmark  option                break          case 3              goto EXIT              break          default              printf   Invalid option n                        EXIT      printf    n          return 0      endif

User · Answer

Few open questions -   If the number is negative then  If the number is 1024   then the  iteratively divide by 2  method will iterate 10 times    we can modify the algo to support the negative number as follows -  count   0 while n    0 if   n   2     1     n   2     -1     count    1   n    2   return count   now to overcome the second problem we can write the algo like -  int bit count int num        int count 0      while num                num  num  amp  num-1           count              return count      for complete reference see    http   goursaha freeoda com Miscellaneous IntegerBitCount html

User · Answer

This is one of those questions where it helps to know your micro-architecture    I just timed two variants under gcc 4 3 3 compiled with -O3 using C   inlines to eliminate function call overhead  one billion iterations  keeping the running sum of all counts to ensure the compiler doesn t remove anything important  using rdtsc for timing  clock cycle precise       inline int pop2 unsigned x  unsigned y        x   x -   x    1    0x55555555       y   y -   y    1    0x55555555       x    x   0x33333333      x    2    0x33333333       y    y   0x33333333      y    2    0x33333333       x    x    x    4     0x0F0F0F0F      y    y    y    4     0x0F0F0F0F      x   x    x    8       y   y    y    8       x   x    x    16       y   y    y    16       return  x y    0x000000FF       The unmodified Hacker s Delight took 12 2 gigacycles   My parallel version  counting twice as many bits  runs in 13 0 gigacycles   10 5s total elapsed for both together on a 2 4GHz Core Duo   25 gigacycles   just over 10 seconds at this clock frequency  so I m confident my timings are right     This has to do with instruction dependency chains  which are very bad for this algorithm   I could nearly double the speed again by using a pair of 64-bit registers   In fact  if I was clever and added x y a little sooner I could shave off some shifts   The 64-bit version with some small tweaks would come out about even  but count twice as many bits again     With 128 bit SIMD registers  yet another factor of two  and the SSE instruction sets often have clever short-cuts  too     There s no reason for the code to be especially transparent   The interface is simple  the algorithm can be referenced on-line in many places  and it s amenable to comprehensive unit test   The programmer who stumbles upon it might even learn something   These bit operations are extremely natural at the machine level     OK  I decided to bench the tweaked 64-bit version   For this one  sizeof unsigned long     8    inline int pop2 unsigned long x  unsigned long y        x   x -   x    1    0x5555555555555555       y   y -   y    1    0x5555555555555555       x    x   0x3333333333333333      x    2    0x3333333333333333       y    y   0x3333333333333333      y    2    0x3333333333333333       x    x    x    4     0x0F0F0F0F0F0F0F0F      y    y    y    4     0x0F0F0F0F0F0F0F0F      x   x   y       x   x    x    8       x   x    x    16       x   x    x    32        return x   0xFF       That looks about right  I m not testing carefully  though     Now the timings come out at 10 70 gigacycles   14 1 gigacycles    That later number summed 128 billion bits and corresponds to 5 9s elapsed on this machine    The non-parallel version speeds up a tiny bit because I m running in 64-bit mode and it likes 64-bit registers slightly better than 32-bit registers     Let s see if there s a bit more OOO pipelining to be had here    This was a bit more involved  so I actually tested a bit   Each term alone sums to 64  all combined sum to 256      inline int pop4 unsigned long x  unsigned long y                   unsigned long u  unsigned long v      enum   m1   0x5555555555555555            m2   0x3333333333333333            m3   0x0F0F0F0F0F0F0F0F            m4   0x000000FF000000FF         x   x -   x    1    m1       y   y -   y    1    m1       u   u -   u    1    m1       v   v -   v    1    m1       x    x   m2      x    2    m2       y    y   m2      y    2    m2       u    u   m2      u    2    m2       v    v   m2      v    2    m2       x   x   y       u   u   v       x    x   m3      x    4    m3       u    u   m3      u    4    m3       x   x   u       x   x    x    8       x   x    x    16       x   x   m4       x   x    x    32       return x   0x000001FF      I was excited for a moment  but it turns out gcc is playing inline tricks with -O3 even though I m not using the inline keyword in some tests   When I let gcc play tricks  a billion calls to pop4   takes 12 56 gigacycles  but I determined it was folding arguments as constant expressions    A more realistic number appears to be 19 6gc for another 30  speed-up   My test loop now looks like this  making sure each argument is different enough to stop gcc from playing tricks          hitime b4   rdtsc        for  unsigned long i   10L   1000 1000 1000  i  lt  11L   1000 1000 1000    i         sum    pop4  i   i 1   i  i 1       hitime e4   rdtsc       256 billion bits summed in 8 17s elapsed   Works out to 1 02s for 32 million bits as benchmarked in the 16-bit table lookup   Can t compare directly  because the other bench doesn t give a clock speed  but looks like I ve slapped the snot out of the 64KB table edition  which is a tragic use of L1 cache in the first place     Update  decided to do the obvious and create pop6   by adding four more duplicated lines   Came out to 22 8gc  384 billion bits summed in 9 5s elapsed    So there s another 20    Now at 800ms for 32 billion bits

User · Answer

The Hacker s Delight bit-twiddling becomes so much clearer when you write out the bit patterns     unsigned int bitCount unsigned int x      x     x  gt  gt  1   amp  0b01010101010101010101010101010101          x        amp  0b01010101010101010101010101010101     x     x  gt  gt  2   amp  0b00110011001100110011001100110011          x        amp  0b00110011001100110011001100110011      x     x  gt  gt  4   amp  0b00001111000011110000111100001111          x        amp  0b00001111000011110000111100001111      x     x  gt  gt  8   amp  0b00000000111111110000000011111111          x        amp  0b00000000111111110000000011111111      x     x  gt  gt  16  amp  0b00000000000000001111111111111111          x        amp  0b00000000000000001111111111111111      return x      The first step adds the even bits to the odd bits  producing a sum of bits in each two   The other steps add high-order chunks to low-order chunks  doubling the chunk size all the way up  until we have the final count taking up the entire int

User · Answer

For Java  there is a java util BitSet  https   docs oracle com javase 8 docs api java util BitSet html  cardinality    Returns the number of bits set to true in this BitSet   The BitSet is memory efficient since it s stored as a Long

User · Answer

In Java 8 or 9 just invoke Integer bitCount

User · Answer

I always use this in Competitive Programming and it s easy to write and efficient    include  lt bits stdc   h gt   using namespace std   int countOnes int n        bitset lt 32 gt  b n       return b count

User · Answer

It s not the fastest or best solution  but I found the same question in my way  and I started to think and think  finally I realized that it can be done like this if you get the problem from mathematical side  and draw a graph  then you find that it s a function which has some periodic part  and then you realize the difference between the periods    so here you go   unsigned int f unsigned int x        switch  x            case 0              return 0          case 1              return 1          case 2              return 1          case 3              return 2          default              return f x 4    f x 4

User · Answer

How about the following  public int CountBits int value        int count   0      while  value  gt  0                if  value  amp  1              count            value  lt  lt   1            return count

User · Answer

Fast C  solution using pre-calculated table of Byte bit counts with branching on input size   public static class BitCount       public static uint GetSetBitsCount uint n                var counts   BYTE BIT COUNTS          return n  lt   0xff   counts n                 n  lt   0xffff   counts n  amp  0xff    counts n  gt  gt  8                 n  lt   0xffffff   counts n  amp  0xff    counts  n  gt  gt  8   amp  0xff    counts  n  gt  gt  16   amp  0xff                 counts n  amp  0xff    counts  n  gt  gt  8   amp  0xff    counts  n  gt  gt  16   amp  0xff    counts  n  gt  gt  24   amp  0xff              public static readonly uint   BYTE BIT COUNTS                  0  1  1  2  1  2  2  3  1  2  2  3  2  3  3  4          1  2  2  3  2  3  3  4  2  3  3  4  3  4  4  5          1  2  2  3  2  3  3  4  2  3  3  4  3  4  4  5          2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6          1  2  2  3  2  3  3  4  2  3  3  4  3  4  4  5          2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6          2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6          3  4  4  5  4  5  5  6  4  5  5  6  5  6  6  7          1  2  2  3  2  3  3  4  2  3  3  4  3  4  4  5          2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6          2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6          3  4  4  5  4  5  5  6  4  5  5  6  5  6  6  7          2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6          3  4  4  5  4  5  5  6  4  5  5  6  5  6  6  7          3  4  4  5  4  5  5  6  4  5  5  6  5  6  6  7          4  5  5  6  5  6  6  7  5  6  6  7  6  7  7  8

User · Answer

python solution  def hammingWeight n  int  - gt  int      sums   0     while  n  0           sums  1         n   n  amp  n-1               return sums  In the binary representation  the least significant 1-bit in n always corresponds to a 0-bit in n - 1  Therefore  anding the two numbers n and n - 1 always flips the least significant 1-bit in n to 0  and keeps all other bits the same

User · Answer

int countBits int x        int n   0      if  x  do n               while x x amp  x-1        return n         Or also   int countBits int x    return  x   1 countBits x amp  x-1    0

User · Answer

int bitcount unsigned int n           int count 0        while n                     count    n  amp  0x1u             n  gt  gt   1                return  count       Iterated  count  runs in time proportional to the total number of bits  It simply loops through all the bits  terminating slightly earlier because of the while condition  Useful  if 1 S or the set bits are sparse and among the least significant bits

User · Answer

Here is the sample code  which might be useful   private static final int   bitCountArr   new int   0  1  1  2  1  2  2  3  1  2  2  3  2  3  3  4  1  2  2  3  2  3  3  4  2  3  3  4  3  4  4  5  1  2  2  3  2  3  3  4  2  3  3  4  3  4  4  5  2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6  1  2  2  3  2  3  3  4  2  3  3  4  3  4  4  5  2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6  2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6  3  4  4  5  4  5  5  6  4  5  5  6  5  6  6  7  1  2  2  3  2  3  3  4  2  3  3  4  3  4  4  5  2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6  2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6  3  4  4  5  4  5  5  6  4  5  5  6  5  6  6  7  2  3  3  4  3  4  4  5  3  4  4  5  4  5  5  6  3  4  4  5  4  5  5  6  4  5  5  6  5  6  6  7  3  4  4  5  4  5  5  6  4  5  5  6  5  6  6  7  4  5  5  6  5  6  6  7  5  6  6  7  6  7  7  8   private static final int firstByteFF   255  public static final int getCountOfSetBits int value       int count   0      for int i 0 i lt 4 i             if value    0  break          count    bitCountArr value  amp  firstByteFF           value  gt  gt  gt   8            return count

User · Answer

def hammingWeight n       count   0     while n          if n amp 1              count    1         n  gt  gt   1     return count

User · Answer

C  20 std  popcount  The following proposal has been merged http   www open-std org jtc1 sc22 wg21 docs papers 2019 p0553r4 html and should add it to a the  lt bit gt  header   I expect the usage to be like    include  lt bit gt   include  lt iostream gt   int main         std  cout  lt  lt  std  popcount 0x55   lt  lt  std  endl      I ll give it a try when support arrives to GCC  GCC 9 1 0 with g  -9 -std c  2a still doesn t support it   The proposal says      Header   lt bit gt   namespace std         25 5 6  counting   template lt class T gt      constexpr int popcount T x  noexcept     and    template lt class T gt    constexpr int popcount T x  noexcept        Constraints  T is an unsigned integer type  3 9 1  basic fundamental         Returns  The number of 1 bits in the value of x    std  rotl and std  rotr were also added to do circular bit rotations  Best practices for circular shift  rotate  operations in C

User · Answer

This is the implementation in golang  func CountBitSet n int  int         count    0     for n  gt  0         count    n  amp  1       n  gt  gt   1            return count

User · Answer

I found an implementation of bit counting in an array with using of SIMD instruction  SSSE3 and AVX2   It has in 2-2 5 times better performance than if it will use   popcnt64 intrinsic function   SSSE3 version    include  lt smmintrin h gt   include  lt stdint h gt   const   m128i Z    mm set1 epi8 0x0   const   m128i F    mm set1 epi8 0xF     Vector with pre-calculated bit count  const   m128i T    mm setr epi8 0  1  1  2  1  2  2  3  1  2  2  3  2  3  3  4    uint64 t BitCount const uint8 t   src  size t size          m128i  sum     mm128 setzero si128        for  size t i   0  i  lt  size  i    16                  load 16-byte vector           m128i  src    mm loadu si128    m128i   src   i              get low 4 bit for every byte in vector           m128i lo    mm and si128  src  F             sum precalculated value from T          sum    mm add epi64  sum   mm sad epu8 Z   mm shuffle epi8 T  lo               get high 4 bit for every byte in vector           m128i hi    mm and si128  mm srli epi16  src  4   F             sum precalculated value from T          sum    mm add epi64  sum   mm sad epu8 Z   mm shuffle epi8 T  hi               uint64 t sum 2        mm storeu si128    m128i  sum   sum       return sum 0    sum 1       AVX2 version    include  lt immintrin h gt   include  lt stdint h gt   const   m256i Z    mm256 set1 epi8 0x0   const   m256i F    mm256 set1 epi8 0xF     Vector with pre-calculated bit count  const   m256i T    mm256 setr epi8 0  1  1  2  1  2  2  3  1  2  2  3  2  3  3  4                                      0  1  1  2  1  2  2  3  1  2  2  3  2  3  3  4    uint64 t BitCount const uint8 t   src  size t size          m256i  sum     mm256 setzero si256        for  size t i   0  i  lt  size  i    32                  load 32-byte vector           m256i  src    mm256 loadu si256    m256i   src   i              get low 4 bit for every byte in vector           m256i lo    mm256 and si256  src  F             sum precalculated value from T          sum    mm256 add epi64  sum   mm256 sad epu8 Z   mm256 shuffle epi8 T  lo               get high 4 bit for every byte in vector           m256i hi    mm256 and si256  mm256 srli epi16  src  4   F             sum precalculated value from T          sum    mm256 add epi64  sum   mm256 sad epu8 Z   mm256 shuffle epi8 T  hi               uint64 t sum 4        mm256 storeu si256    m256i  sum   sum       return sum 0    sum 1    sum 2    sum 3

User · Answer

I got bored  and timed a billion iterations of three approaches  Compiler is gcc -O3  CPU is whatever they put in the 1st gen Macbook Pro   Fastest is the following  at 3 7 seconds   static unsigned char wordbits 65536      bitcounts of ints between 0 and 65535    static int popcount  unsigned int i         return  wordbits i amp 0xFFFF    wordbits i gt  gt 16         Second place goes to the same code but looking up 4 bytes instead of 2 halfwords  That took around 5 5 seconds   Third place goes to the bit-twiddling  sideways addition  approach  which took 8 6 seconds   Fourth place goes to GCC s   builtin popcount    at a shameful 11 seconds   The counting one-bit-at-a-time approach was waaaay slower  and I got bored of waiting for it to complete   So if you care about performance above all else then use the first approach  If you care  but not enough to spend 64Kb of RAM on it  use the second approach  Otherwise use the readable  but slow  one-bit-at-a-time approach   It s hard to think of a situation where you d want to use the bit-twiddling approach   Edit  Similar results here

User · Answer

I use the following function  Haven t checked benchmarks  but it works    int msb int num        int m   0      for  int i   16  i  gt  0  i   i gt  gt 1                   debug i  num  m           if num gt  gt i                        m    i              num gt  gt  i                      return m

User · Answer

I have not seen this approach anywhere   int nbits unsigned char v        return     v -   v  gt  gt  1   amp  0x55     0x1010101   amp  0x30c00c03    0x10040041   gt  gt  0x1c      It works per byte  so it would have to be called 4 times for a 32-bit integer  It is derived from the sideways addition but uses two 32-bit multiplications to reduce the number of instructions to only 7   Most current C compilers will optimize this function using SIMD  SSE2  instructions when it is clear that the number of requests is a multiple of 4  and it becomes quite competitive   It is portable  can be defined as a macro or inline function and does not need data tables   This approach can be extended to work on 16 bits at a time  using 64-bit multiplications  However  it fails when all 16 bits are set  returning zero  so it can be used only when the 0xffff input value is not present   It is also slower due to the 64-bit operations and does not optimize well

User · Answer

what you can do is   while n       n n amp  n-1       count        the logic behind this is the bits of n-1 is inverted from rightmost set bit of n  if n 6 i e 110 then 5 is 101 the bits are inverted from rightmost set bit of n  so if we  amp  these two we will make the rightmost bit 0 in every iteration and always go to the next rightmost set bit Hence  counting the set bit The worst time complexity will be O logn  when every bit is set

User · Answer

I m very disappointed to see that nobody has responded with the functional master race recursive solution  by far the purest one  and can be used with any bit length    template lt typename T gt  int popcnt T n      if  n gt 0      return n amp 1   popcnt n gt  gt 1     return 0

User · Answer

Here s something that works in PHP  all PHP intergers are 32 bit signed  thus 31 bit    function bits population  nInteger          nPop 0      while  nInteger                 nInteger   1 lt  lt  floor 1 log  nInteger  log 2  -1             nPop              return  nPop

User · Answer

This can be done in O k   where k is the number of bits set   int NumberOfSetBits int n        int count   0       while  n              count          n    n - 1   amp  n             return count

User · Answer

32-bit or not   I just came with this method in Java after reading  cracking the coding interview  4th edition exercice 5 5   chap 5   Bit Manipulation   If the least significant bit is 1 increment count  then right-shift the integer   public static int bitCount  int n       int count   0      for  int i n  i  0  i   i  gt  gt  1           count    i  amp  1            return count      I think this one is more intuitive than the solutions with constant 0x33333333  no matter how fast they are  It depends on your definition of  best algorithm

[algorithm] How to count the number of set bits in a 32-bit integer?

Examples related to algorithm

Examples related to binary

Examples related to bit-manipulation

Examples related to hammingweight

Examples related to iec10967