[c] Rounding up to next power of 2

I want to write a function that returns the nearest next power of 2 number. For example if my input is 789, the output should be 1024. Is there any way of achieving this without using any loops but just using some bitwise operators?

This question is related to c optimization bit-manipulation

The answer is


If you need it for OpenGL related stuff:

/* Compute the nearest power of 2 number that is 
 * less than or equal to the value passed in. 
 */
static GLuint 
nearestPower( GLuint value )
{
    int i = 1;

    if (value == 0) return -1;      /* Error! */
    for (;;) {
         if (value == 1) return i;
         else if (value == 3) return i*4;
         value >>= 1; i *= 2;
    }
}

next = pow(2, ceil(log(x)/log(2)));

This works by finding the number you'd have raise 2 by to get x (take the log of the number, and divide by the log of the desired base, see wikipedia for more). Then round that up with ceil to get the nearest whole number power.

This is a more general purpose (i.e. slower!) method than the bitwise methods linked elsewhere, but good to know the maths, eh?


You might find the following clarification to be helpful towards your purpose:


For any unsigned type, building on the Bit Twiddling Hacks:

#include <climits>
#include <type_traits>

template <typename UnsignedType>
UnsignedType round_up_to_power_of_2(UnsignedType v) {
  static_assert(std::is_unsigned<UnsignedType>::value, "Only works for unsigned types");
  v--;
  for (size_t i = 1; i < sizeof(v) * CHAR_BIT; i *= 2) //Prefer size_t "Warning comparison between signed and unsigned integer"
  {
    v |= v >> i;
  }
  return ++v;
}

There isn't really a loop there as the compiler knows at compile time the number of iterations.


next = pow(2, ceil(log(x)/log(2)));

This works by finding the number you'd have raise 2 by to get x (take the log of the number, and divide by the log of the desired base, see wikipedia for more). Then round that up with ceil to get the nearest whole number power.

This is a more general purpose (i.e. slower!) method than the bitwise methods linked elsewhere, but good to know the maths, eh?


Despite the question is tagged as c here my five cents. Lucky us, C++ 20 would include std::ceil2 and std::floor2 (see here). It is consexpr template functions, current GCC implementation uses bitshifting and works with any integral unsigned type.


One more, although I use cycle, but thi is much faster than math operands

power of two "floor" option:

int power = 1;
while (x >>= 1) power <<= 1;

power of two "ceil" option:

int power = 2;
x--;    // <<-- UPDATED
while (x >>= 1) power <<= 1;

UPDATE

As mentioned in comments there was mistake in ceil where its result was wrong.

Here are full functions:

unsigned power_floor(unsigned x) {
    int power = 1;
    while (x >>= 1) power <<= 1;
    return power;
}

unsigned power_ceil(unsigned x) {
    if (x <= 1) return 1;
    int power = 2;
    x--;
    while (x >>= 1) power <<= 1;
    return power;
}

Here's my solution in C. Hope this helps!

int next_power_of_two(int n) {
    int i = 0;
    for (--n; n > 0; n >>= 1) {
        i++;
    }
    return 1 << i;
}

I'm trying to get nearest lower power of 2 and made this function. May it help you.Just multiplied nearest lower number times 2 to get nearest upper power of 2

int nearest_upper_power(int number){
    int temp=number;
    while((number&(number-1))!=0){
        temp<<=1;
        number&=temp;
    }
    //Here number is closest lower power 
    number*=2;
    return number;
}

next = pow(2, ceil(log(x)/log(2)));

This works by finding the number you'd have raise 2 by to get x (take the log of the number, and divide by the log of the desired base, see wikipedia for more). Then round that up with ceil to get the nearest whole number power.

This is a more general purpose (i.e. slower!) method than the bitwise methods linked elsewhere, but good to know the maths, eh?


Here is what I'm using to have this be a constant expression, if the input is a constant expression.

#define uptopow2_0(v) ((v) - 1)
#define uptopow2_1(v) (uptopow2_0(v) | uptopow2_0(v) >> 1)
#define uptopow2_2(v) (uptopow2_1(v) | uptopow2_1(v) >> 2)
#define uptopow2_3(v) (uptopow2_2(v) | uptopow2_2(v) >> 4)
#define uptopow2_4(v) (uptopow2_3(v) | uptopow2_3(v) >> 8)
#define uptopow2_5(v) (uptopow2_4(v) | uptopow2_4(v) >> 16)

#define uptopow2(v) (uptopow2_5(v) + 1)  /* this is the one programmer uses */

So for instance, an expression like:

uptopow2(sizeof (struct foo))

will nicely reduce to a constant.


/*
** http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog
*/
#define __LOG2A(s) ((s &0xffffffff00000000) ? (32 +__LOG2B(s >>32)): (__LOG2B(s)))
#define __LOG2B(s) ((s &0xffff0000)         ? (16 +__LOG2C(s >>16)): (__LOG2C(s)))
#define __LOG2C(s) ((s &0xff00)             ? (8  +__LOG2D(s >>8)) : (__LOG2D(s)))
#define __LOG2D(s) ((s &0xf0)               ? (4  +__LOG2E(s >>4)) : (__LOG2E(s)))
#define __LOG2E(s) ((s &0xc)                ? (2  +__LOG2F(s >>2)) : (__LOG2F(s)))
#define __LOG2F(s) ((s &0x2)                ? (1)                  : (0))

#define LOG2_UINT64 __LOG2A
#define LOG2_UINT32 __LOG2B
#define LOG2_UINT16 __LOG2C
#define LOG2_UINT8  __LOG2D

static inline uint64_t
next_power_of_2(uint64_t i)
{
#if defined(__GNUC__)
    return 1UL <<(1 +(63 -__builtin_clzl(i -1)));
#else
    i =i -1;
    i =LOG2_UINT64(i);
    return 1UL <<(1 +i);
#endif
}

If you do not want to venture into the realm of undefined behaviour the input value must be between 1 and 2^63. The macro is also useful to set constant at compile time.


next = pow(2, ceil(log(x)/log(2)));

This works by finding the number you'd have raise 2 by to get x (take the log of the number, and divide by the log of the desired base, see wikipedia for more). Then round that up with ceil to get the nearest whole number power.

This is a more general purpose (i.e. slower!) method than the bitwise methods linked elsewhere, but good to know the maths, eh?


import sys


def is_power2(x):
    return x > 0 and ((x & (x - 1)) == 0)


def find_nearest_power2(x):
    if x <= 0:
        raise ValueError("invalid input")
    if is_power2(x):
        return x
    else:
        bits = get_bits(x)
        upper = 1 << (bits)
        lower = 1 << (bits - 1)
        mid = (upper + lower) // 2
        if (x - mid) > 0:
            return upper
        else:
            return lower


def get_bits(x):
    """return number of bits in binary representation"""
    if x < 0:
        raise ValueError("invalid input: input should be positive integer")
    count = 0
    while (x != 0):
        try:
            x = x >> 1
        except TypeError as error:
            print(error, "input should be of type integer")
            sys.exit(1)
        count += 1
    return count


If you need it for OpenGL related stuff:

/* Compute the nearest power of 2 number that is 
 * less than or equal to the value passed in. 
 */
static GLuint 
nearestPower( GLuint value )
{
    int i = 1;

    if (value == 0) return -1;      /* Error! */
    for (;;) {
         if (value == 1) return i;
         else if (value == 3) return i*4;
         value >>= 1; i *= 2;
    }
}

/*
** http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog
*/
#define __LOG2A(s) ((s &0xffffffff00000000) ? (32 +__LOG2B(s >>32)): (__LOG2B(s)))
#define __LOG2B(s) ((s &0xffff0000)         ? (16 +__LOG2C(s >>16)): (__LOG2C(s)))
#define __LOG2C(s) ((s &0xff00)             ? (8  +__LOG2D(s >>8)) : (__LOG2D(s)))
#define __LOG2D(s) ((s &0xf0)               ? (4  +__LOG2E(s >>4)) : (__LOG2E(s)))
#define __LOG2E(s) ((s &0xc)                ? (2  +__LOG2F(s >>2)) : (__LOG2F(s)))
#define __LOG2F(s) ((s &0x2)                ? (1)                  : (0))

#define LOG2_UINT64 __LOG2A
#define LOG2_UINT32 __LOG2B
#define LOG2_UINT16 __LOG2C
#define LOG2_UINT8  __LOG2D

static inline uint64_t
next_power_of_2(uint64_t i)
{
#if defined(__GNUC__)
    return 1UL <<(1 +(63 -__builtin_clzl(i -1)));
#else
    i =i -1;
    i =LOG2_UINT64(i);
    return 1UL <<(1 +i);
#endif
}

If you do not want to venture into the realm of undefined behaviour the input value must be between 1 and 2^63. The macro is also useful to set constant at compile time.


Many processor architectures support log base 2 or very similar operation – count leading zeros. Many compilers have intrinsics for it. See https://en.wikipedia.org/wiki/Find_first_set


Here's my solution in C. Hope this helps!

int next_power_of_two(int n) {
    int i = 0;
    for (--n; n > 0; n >>= 1) {
        i++;
    }
    return 1 << i;
}

Assuming you have a good compiler & it can do the bit twiddling before hand thats above me at this point, but anyway this works!!!

    // http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
    #define SH1(v)  ((v-1) | ((v-1) >> 1))            // accidently came up w/ this...
    #define SH2(v)  ((v) | ((v) >> 2))
    #define SH4(v)  ((v) | ((v) >> 4))
    #define SH8(v)  ((v) | ((v) >> 8))
    #define SH16(v) ((v) | ((v) >> 16))
    #define OP(v) (SH16(SH8(SH4(SH2(SH1(v))))))         

    #define CB0(v)   ((v) - (((v) >> 1) & 0x55555555))
    #define CB1(v)   (((v) & 0x33333333) + (((v) >> 2) & 0x33333333))
    #define CB2(v)   ((((v) + ((v) >> 4) & 0xF0F0F0F) * 0x1010101) >> 24)
    #define CBSET(v) (CB2(CB1(CB0((v)))))
    #define FLOG2(v) (CBSET(OP(v)))

Test code below:

#include <iostream>

using namespace std;

// http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
#define SH1(v)  ((v-1) | ((v-1) >> 1))  // accidently guess this...
#define SH2(v)  ((v) | ((v) >> 2))
#define SH4(v)  ((v) | ((v) >> 4))
#define SH8(v)  ((v) | ((v) >> 8))
#define SH16(v) ((v) | ((v) >> 16))
#define OP(v) (SH16(SH8(SH4(SH2(SH1(v))))))         

#define CB0(v)   ((v) - (((v) >> 1) & 0x55555555))
#define CB1(v)   (((v) & 0x33333333) + (((v) >> 2) & 0x33333333))
#define CB2(v)   ((((v) + ((v) >> 4) & 0xF0F0F0F) * 0x1010101) >> 24)
#define CBSET(v) (CB2(CB1(CB0((v)))))
#define FLOG2(v) (CBSET(OP(v))) 

#define SZ4         FLOG2(4)
#define SZ6         FLOG2(6)
#define SZ7         FLOG2(7)
#define SZ8         FLOG2(8) 
#define SZ9         FLOG2(9)
#define SZ16        FLOG2(16)
#define SZ17        FLOG2(17)
#define SZ127       FLOG2(127)
#define SZ1023      FLOG2(1023)
#define SZ1024      FLOG2(1024)
#define SZ2_17      FLOG2((1ul << 17))  // 
#define SZ_LOG2     FLOG2(SZ)

#define DBG_PRINT(x) do { std::printf("Line:%-4d" "  %10s = %-10d\n", __LINE__, #x, x); } while(0);

uint32_t arrTble[FLOG2(63)];

int main(){
    int8_t n;

    DBG_PRINT(SZ4);    
    DBG_PRINT(SZ6);    
    DBG_PRINT(SZ7);    
    DBG_PRINT(SZ8);    
    DBG_PRINT(SZ9); 
    DBG_PRINT(SZ16);
    DBG_PRINT(SZ17);
    DBG_PRINT(SZ127);
    DBG_PRINT(SZ1023);
    DBG_PRINT(SZ1024);
    DBG_PRINT(SZ2_17);

    return(0);
}

Outputs:

Line:39           SZ4 = 2
Line:40           SZ6 = 3
Line:41           SZ7 = 3
Line:42           SZ8 = 3
Line:43           SZ9 = 4
Line:44          SZ16 = 4
Line:45          SZ17 = 5
Line:46         SZ127 = 7
Line:47        SZ1023 = 10
Line:48        SZ1024 = 10
Line:49        SZ2_16 = 17

If you're using GCC, you might want to have a look at Optimizing the next_pow2() function by Lockless Inc.. This page describes a way to use built-in function builtin_clz() (count leading zero) and later use directly x86 (ia32) assembler instruction bsr (bit scan reverse), just like it's described in another answer's link to gamedev site. This code might be faster than those described in previous answer.

By the way, if you're not going to use assembler instruction and 64bit data type, you can use this

/**
 * return the smallest power of two value
 * greater than x
 *
 * Input range:  [2..2147483648]
 * Output range: [2..2147483648]
 *
 */
__attribute__ ((const))
static inline uint32_t p2(uint32_t x)
{
#if 0
    assert(x > 1);
    assert(x <= ((UINT32_MAX/2) + 1));
#endif

    return 1 << (32 - __builtin_clz (x - 1));
}

unsigned long upper_power_of_two(unsigned long v)
{
    v--;
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;
    v++;
    return v;

}

In x86 you can use the sse4 bit manipulation instructions to make it fast.

//assume input is in eax
popcnt edx,eax
lzcnt ecx,eax
cmp edx,1
jle @done       //popcnt says its a power of 2, return input unchanged
mov eax,2
shl eax,cl
@done: rep ret

In c you can use the matching intrinsics.


For IEEE floats you'd be able to do something like this.

int next_power_of_two(float a_F){
    int f = *(int*)&a_F;
    int b = f << 9 != 0; // If we're a power of two this is 0, otherwise this is 1

    f >>= 23; // remove factional part of floating point number
    f -= 127; // subtract 127 (the bias) from the exponent

    // adds one to the exponent if were not a power of two, 
    // then raises our new exponent to the power of two again.
    return (1 << (f + b)); 
}

If you need an integer solution and you're able to use inline assembly, BSR will give you the log2 of an integer on the x86. It counts how many right bits are set, which is exactly equal to the log2 of that number. Other processors have similar instructions (often), such as CLZ and depending on your compiler there might be an intrinsic available to do the work for you.


import sys


def is_power2(x):
    return x > 0 and ((x & (x - 1)) == 0)


def find_nearest_power2(x):
    if x <= 0:
        raise ValueError("invalid input")
    if is_power2(x):
        return x
    else:
        bits = get_bits(x)
        upper = 1 << (bits)
        lower = 1 << (bits - 1)
        mid = (upper + lower) // 2
        if (x - mid) > 0:
            return upper
        else:
            return lower


def get_bits(x):
    """return number of bits in binary representation"""
    if x < 0:
        raise ValueError("invalid input: input should be positive integer")
    count = 0
    while (x != 0):
        try:
            x = x >> 1
        except TypeError as error:
            print(error, "input should be of type integer")
            sys.exit(1)
        count += 1
    return count


In x86 you can use the sse4 bit manipulation instructions to make it fast.

//assume input is in eax
popcnt edx,eax
lzcnt ecx,eax
cmp edx,1
jle @done       //popcnt says its a power of 2, return input unchanged
mov eax,2
shl eax,cl
@done: rep ret

In c you can use the matching intrinsics.


Assuming you have a good compiler & it can do the bit twiddling before hand thats above me at this point, but anyway this works!!!

    // http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
    #define SH1(v)  ((v-1) | ((v-1) >> 1))            // accidently came up w/ this...
    #define SH2(v)  ((v) | ((v) >> 2))
    #define SH4(v)  ((v) | ((v) >> 4))
    #define SH8(v)  ((v) | ((v) >> 8))
    #define SH16(v) ((v) | ((v) >> 16))
    #define OP(v) (SH16(SH8(SH4(SH2(SH1(v))))))         

    #define CB0(v)   ((v) - (((v) >> 1) & 0x55555555))
    #define CB1(v)   (((v) & 0x33333333) + (((v) >> 2) & 0x33333333))
    #define CB2(v)   ((((v) + ((v) >> 4) & 0xF0F0F0F) * 0x1010101) >> 24)
    #define CBSET(v) (CB2(CB1(CB0((v)))))
    #define FLOG2(v) (CBSET(OP(v)))

Test code below:

#include <iostream>

using namespace std;

// http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
#define SH1(v)  ((v-1) | ((v-1) >> 1))  // accidently guess this...
#define SH2(v)  ((v) | ((v) >> 2))
#define SH4(v)  ((v) | ((v) >> 4))
#define SH8(v)  ((v) | ((v) >> 8))
#define SH16(v) ((v) | ((v) >> 16))
#define OP(v) (SH16(SH8(SH4(SH2(SH1(v))))))         

#define CB0(v)   ((v) - (((v) >> 1) & 0x55555555))
#define CB1(v)   (((v) & 0x33333333) + (((v) >> 2) & 0x33333333))
#define CB2(v)   ((((v) + ((v) >> 4) & 0xF0F0F0F) * 0x1010101) >> 24)
#define CBSET(v) (CB2(CB1(CB0((v)))))
#define FLOG2(v) (CBSET(OP(v))) 

#define SZ4         FLOG2(4)
#define SZ6         FLOG2(6)
#define SZ7         FLOG2(7)
#define SZ8         FLOG2(8) 
#define SZ9         FLOG2(9)
#define SZ16        FLOG2(16)
#define SZ17        FLOG2(17)
#define SZ127       FLOG2(127)
#define SZ1023      FLOG2(1023)
#define SZ1024      FLOG2(1024)
#define SZ2_17      FLOG2((1ul << 17))  // 
#define SZ_LOG2     FLOG2(SZ)

#define DBG_PRINT(x) do { std::printf("Line:%-4d" "  %10s = %-10d\n", __LINE__, #x, x); } while(0);

uint32_t arrTble[FLOG2(63)];

int main(){
    int8_t n;

    DBG_PRINT(SZ4);    
    DBG_PRINT(SZ6);    
    DBG_PRINT(SZ7);    
    DBG_PRINT(SZ8);    
    DBG_PRINT(SZ9); 
    DBG_PRINT(SZ16);
    DBG_PRINT(SZ17);
    DBG_PRINT(SZ127);
    DBG_PRINT(SZ1023);
    DBG_PRINT(SZ1024);
    DBG_PRINT(SZ2_17);

    return(0);
}

Outputs:

Line:39           SZ4 = 2
Line:40           SZ6 = 3
Line:41           SZ7 = 3
Line:42           SZ8 = 3
Line:43           SZ9 = 4
Line:44          SZ16 = 4
Line:45          SZ17 = 5
Line:46         SZ127 = 7
Line:47        SZ1023 = 10
Line:48        SZ1024 = 10
Line:49        SZ2_16 = 17

If you want an one-line-template. Here it is

int nxt_po2(int n) { return 1 + (n|=(n|=(n|=(n|=(n|=(n-=1)>>1)>>2)>>4)>>8)>>16); }

or

int nxt_po2(int n) { return 1 + (n|=(n|=(n|=(n|=(n|=(n-=1)>>(1<<0))>>(1<<1))>>(1<<2))>>(1<<3))>>(1<<4)); }

I'm trying to get nearest lower power of 2 and made this function. May it help you.Just multiplied nearest lower number times 2 to get nearest upper power of 2

int nearest_upper_power(int number){
    int temp=number;
    while((number&(number-1))!=0){
        temp<<=1;
        number&=temp;
    }
    //Here number is closest lower power 
    number*=2;
    return number;
}

An efficient Microsoft (e.g., Visual Studio 2017) specific solution in C / C++ for integer input. Handles the case of the input exactly matching a power of two value by decrementing before checking the location of the most significant 1 bit.

inline unsigned int ExpandToPowerOf2(unsigned int Value)
{
    unsigned long Index;
    _BitScanReverse(&Index, Value - 1);
    return (1U << (Index + 1));
}

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

#if defined(WIN64) // The _BitScanReverse64 intrinsic is only available for 64 bit builds because it depends on x64

inline unsigned long long ExpandToPowerOf2(unsigned long long Value)
{
    unsigned long Index;
    _BitScanReverse64(&Index, Value - 1);
    return (1ULL << (Index + 1));
}

#endif

This generates 5 or so inlined instructions for an Intel processor similar to the following:

dec eax
bsr rcx, rax
inc ecx
mov eax, 1
shl rax, cl

Apparently the Visual Studio C++ compiler isn't coded to optimize this for compile-time values, but it's not like there are a whole lot of instructions there.

Edit:

If you want an input value of 1 to yield 1 (2 to the zeroth power), a small modification to the above code still generates straight through instructions with no branch.

inline unsigned int ExpandToPowerOf2(unsigned int Value)
{
    unsigned long Index;
    _BitScanReverse(&Index, --Value);
    if (Value == 0)
        Index = (unsigned long) -1;
    return (1U << (Index + 1));
}

Generates just a few more instructions. The trick is that Index can be replaced by a test followed by a cmove instruction.


For any unsigned type, building on the Bit Twiddling Hacks:

#include <climits>
#include <type_traits>

template <typename UnsignedType>
UnsignedType round_up_to_power_of_2(UnsignedType v) {
  static_assert(std::is_unsigned<UnsignedType>::value, "Only works for unsigned types");
  v--;
  for (size_t i = 1; i < sizeof(v) * CHAR_BIT; i *= 2) //Prefer size_t "Warning comparison between signed and unsigned integer"
  {
    v |= v >> i;
  }
  return ++v;
}

There isn't really a loop there as the compiler knows at compile time the number of iterations.


For completeness here is a floating-point implementation in bog-standard C.

double next_power_of_two(double value) {
    int exp;
    if(frexp(value, &exp) == 0.5) {
        // Omit this case to round precise powers of two up to the *next* power
        return value;
    }
    return ldexp(1.0, exp);
}

Portable solution in C#:

long value = 27
long nextPowerOfTwo = 1 << (int)Math.Ceiling(Math.Log2(value));

nextPowerOfTwo is 32.

Math.Ceiling(Math.Log2(value)) calculates the exponent of the next power of two, the 1 << calculates the real value through bitshifting.


unsigned long upper_power_of_two(unsigned long v)
{
    v--;
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;
    v++;
    return v;

}

If you need it for OpenGL related stuff:

/* Compute the nearest power of 2 number that is 
 * less than or equal to the value passed in. 
 */
static GLuint 
nearestPower( GLuint value )
{
    int i = 1;

    if (value == 0) return -1;      /* Error! */
    for (;;) {
         if (value == 1) return i;
         else if (value == 3) return i*4;
         value >>= 1; i *= 2;
    }
}

An efficient Microsoft (e.g., Visual Studio 2017) specific solution in C / C++ for integer input. Handles the case of the input exactly matching a power of two value by decrementing before checking the location of the most significant 1 bit.

inline unsigned int ExpandToPowerOf2(unsigned int Value)
{
    unsigned long Index;
    _BitScanReverse(&Index, Value - 1);
    return (1U << (Index + 1));
}

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

#if defined(WIN64) // The _BitScanReverse64 intrinsic is only available for 64 bit builds because it depends on x64

inline unsigned long long ExpandToPowerOf2(unsigned long long Value)
{
    unsigned long Index;
    _BitScanReverse64(&Index, Value - 1);
    return (1ULL << (Index + 1));
}

#endif

This generates 5 or so inlined instructions for an Intel processor similar to the following:

dec eax
bsr rcx, rax
inc ecx
mov eax, 1
shl rax, cl

Apparently the Visual Studio C++ compiler isn't coded to optimize this for compile-time values, but it's not like there are a whole lot of instructions there.

Edit:

If you want an input value of 1 to yield 1 (2 to the zeroth power), a small modification to the above code still generates straight through instructions with no branch.

inline unsigned int ExpandToPowerOf2(unsigned int Value)
{
    unsigned long Index;
    _BitScanReverse(&Index, --Value);
    if (Value == 0)
        Index = (unsigned long) -1;
    return (1U << (Index + 1));
}

Generates just a few more instructions. The trick is that Index can be replaced by a test followed by a cmove instruction.


For IEEE floats you'd be able to do something like this.

int next_power_of_two(float a_F){
    int f = *(int*)&a_F;
    int b = f << 9 != 0; // If we're a power of two this is 0, otherwise this is 1

    f >>= 23; // remove factional part of floating point number
    f -= 127; // subtract 127 (the bias) from the exponent

    // adds one to the exponent if were not a power of two, 
    // then raises our new exponent to the power of two again.
    return (1 << (f + b)); 
}

If you need an integer solution and you're able to use inline assembly, BSR will give you the log2 of an integer on the x86. It counts how many right bits are set, which is exactly equal to the log2 of that number. Other processors have similar instructions (often), such as CLZ and depending on your compiler there might be an intrinsic available to do the work for you.


For completeness here is a floating-point implementation in bog-standard C.

double next_power_of_two(double value) {
    int exp;
    if(frexp(value, &exp) == 0.5) {
        // Omit this case to round precise powers of two up to the *next* power
        return value;
    }
    return ldexp(1.0, exp);
}

For IEEE floats you'd be able to do something like this.

int next_power_of_two(float a_F){
    int f = *(int*)&a_F;
    int b = f << 9 != 0; // If we're a power of two this is 0, otherwise this is 1

    f >>= 23; // remove factional part of floating point number
    f -= 127; // subtract 127 (the bias) from the exponent

    // adds one to the exponent if were not a power of two, 
    // then raises our new exponent to the power of two again.
    return (1 << (f + b)); 
}

If you need an integer solution and you're able to use inline assembly, BSR will give you the log2 of an integer on the x86. It counts how many right bits are set, which is exactly equal to the log2 of that number. Other processors have similar instructions (often), such as CLZ and depending on your compiler there might be an intrinsic available to do the work for you.


Convert it to a float and then use .hex() which shows the normalized IEEE representation.

>>> float(789).hex() '0x1.8a80000000000p+9'

Then just extract the exponent and add 1.

>>> int(float(789).hex().split('p+')[1]) + 1 10

And raise 2 to this power.

>>> 2 ** (int(float(789).hex().split('p+')[1]) + 1) 1024


One more, although I use cycle, but thi is much faster than math operands

power of two "floor" option:

int power = 1;
while (x >>= 1) power <<= 1;

power of two "ceil" option:

int power = 2;
x--;    // <<-- UPDATED
while (x >>= 1) power <<= 1;

UPDATE

As mentioned in comments there was mistake in ceil where its result was wrong.

Here are full functions:

unsigned power_floor(unsigned x) {
    int power = 1;
    while (x >>= 1) power <<= 1;
    return power;
}

unsigned power_ceil(unsigned x) {
    if (x <= 1) return 1;
    int power = 2;
    x--;
    while (x >>= 1) power <<= 1;
    return power;
}

A variant of @YannDroneaud answer valid for x==1, only for x86 plateforms, compilers, gcc or clang:

__attribute__ ((const))
static inline uint32_t p2(uint32_t x)
{
#if 0
    assert(x > 0);
    assert(x <= ((UINT32_MAX/2) + 1));
#endif
  int clz;
  uint32_t xm1 = x-1;
  asm(
    "lzcnt %1,%0"
    :"=r" (clz)
    :"rm" (xm1)
    :"cc"
    );
    return 1 << (32 - clz);
}

Portable solution in C#:

long value = 27
long nextPowerOfTwo = 1 << (int)Math.Ceiling(Math.Log2(value));

nextPowerOfTwo is 32.

Math.Ceiling(Math.Log2(value)) calculates the exponent of the next power of two, the 1 << calculates the real value through bitshifting.


For IEEE floats you'd be able to do something like this.

int next_power_of_two(float a_F){
    int f = *(int*)&a_F;
    int b = f << 9 != 0; // If we're a power of two this is 0, otherwise this is 1

    f >>= 23; // remove factional part of floating point number
    f -= 127; // subtract 127 (the bias) from the exponent

    // adds one to the exponent if were not a power of two, 
    // then raises our new exponent to the power of two again.
    return (1 << (f + b)); 
}

If you need an integer solution and you're able to use inline assembly, BSR will give you the log2 of an integer on the x86. It counts how many right bits are set, which is exactly equal to the log2 of that number. Other processors have similar instructions (often), such as CLZ and depending on your compiler there might be an intrinsic available to do the work for you.


Many processor architectures support log base 2 or very similar operation – count leading zeros. Many compilers have intrinsics for it. See https://en.wikipedia.org/wiki/Find_first_set


Despite the question is tagged as c here my five cents. Lucky us, C++ 20 would include std::ceil2 and std::floor2 (see here). It is consexpr template functions, current GCC implementation uses bitshifting and works with any integral unsigned type.


If you want an one-line-template. Here it is

int nxt_po2(int n) { return 1 + (n|=(n|=(n|=(n|=(n|=(n-=1)>>1)>>2)>>4)>>8)>>16); }

or

int nxt_po2(int n) { return 1 + (n|=(n|=(n|=(n|=(n|=(n-=1)>>(1<<0))>>(1<<1))>>(1<<2))>>(1<<3))>>(1<<4)); }

I think this works, too:

int power = 1;
while(power < x)
    power*=2;

And the answer is power.


Adapted Paul Dixon's answer to Excel, this works perfectly.

 =POWER(2,CEILING.MATH(LOG(A1)/LOG(2)))

A variant of @YannDroneaud answer valid for x==1, only for x86 plateforms, compilers, gcc or clang:

__attribute__ ((const))
static inline uint32_t p2(uint32_t x)
{
#if 0
    assert(x > 0);
    assert(x <= ((UINT32_MAX/2) + 1));
#endif
  int clz;
  uint32_t xm1 = x-1;
  asm(
    "lzcnt %1,%0"
    :"=r" (clz)
    :"rm" (xm1)
    :"cc"
    );
    return 1 << (32 - clz);
}

Convert it to a float and then use .hex() which shows the normalized IEEE representation.

>>> float(789).hex() '0x1.8a80000000000p+9'

Then just extract the exponent and add 1.

>>> int(float(789).hex().split('p+')[1]) + 1 10

And raise 2 to this power.

>>> 2 ** (int(float(789).hex().split('p+')[1]) + 1) 1024


unsigned long upper_power_of_two(unsigned long v)
{
    v--;
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;
    v++;
    return v;

}

You might find the following clarification to be helpful towards your purpose:


Adapted Paul Dixon's answer to Excel, this works perfectly.

 =POWER(2,CEILING.MATH(LOG(A1)/LOG(2)))

Here is what I'm using to have this be a constant expression, if the input is a constant expression.

#define uptopow2_0(v) ((v) - 1)
#define uptopow2_1(v) (uptopow2_0(v) | uptopow2_0(v) >> 1)
#define uptopow2_2(v) (uptopow2_1(v) | uptopow2_1(v) >> 2)
#define uptopow2_3(v) (uptopow2_2(v) | uptopow2_2(v) >> 4)
#define uptopow2_4(v) (uptopow2_3(v) | uptopow2_3(v) >> 8)
#define uptopow2_5(v) (uptopow2_4(v) | uptopow2_4(v) >> 16)

#define uptopow2(v) (uptopow2_5(v) + 1)  /* this is the one programmer uses */

So for instance, an expression like:

uptopow2(sizeof (struct foo))

will nicely reduce to a constant.


Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to optimization

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Measuring execution time of a function in C++ GROUP BY having MAX date How to efficiently remove duplicates from an array without using Set Storing JSON in database vs. having a new column for each key Read file As String How to write a large buffer into a binary file in C++, fast? Is optimisation level -O3 dangerous in g++? Why is processing a sorted array faster than processing an unsorted array? MySQL my.cnf performance tuning recommendations

Examples related to bit-manipulation

What is (x & 1) and (x >>= 1)? 'and' (boolean) vs '&' (bitwise) - Why difference in behavior with lists vs numpy arrays? What does AND 0xFF do? bitwise XOR of hex numbers in python What is Bit Masking? What does a bitwise shift (left or right) do and what is it used for? Implement division with bit-wise operator How can I multiply and divide using only bit shifting and adding? In C/C++ what's the simplest way to reverse the order of bits in a byte? How do I get bit-by-bit data from an integer value in C?