Best practices for circular shift rotate operations in C

Question

Left and right shift operators   lt  lt  and     are already available in C    However  I couldn t find out how I could perform circular shift or rotate operations   How can operations like  Rotate Left  and  Rotate Right  be performed    Rotating right twice here   Initial -- gt  1000 0011 0100 0010   should result in   Final   -- gt  1010 0000 1101 0000   An example would be helpful    editor s note  Many common ways of expressing rotates in C suffer from undefined behaviour if the rotate count is zero  or compile to more than just a single rotate machine instruction   This question s answer should document best practices

User · Answer

C  20 std  rotl and std  rotr  It has arrived  http   www open-std org jtc1 sc22 wg21 docs papers 2019 p0553r4 html and should add it to the  lt bit gt  header   cppreference says that the usage will be like    include  lt bit gt   include  lt bitset gt   include  lt cstdint gt   include  lt iostream gt   int main         std  uint8 t i   0b00011101      std  cout  lt  lt   i               lt  lt  std  bitset lt 8 gt  i   lt  lt    n       std  cout  lt  lt   rotl i 0        lt  lt  std  bitset lt 8 gt  std  rotl i 0    lt  lt    n       std  cout  lt  lt   rotl i 1        lt  lt  std  bitset lt 8 gt  std  rotl i 1    lt  lt    n       std  cout  lt  lt   rotl i 4        lt  lt  std  bitset lt 8 gt  std  rotl i 4    lt  lt    n       std  cout  lt  lt   rotl i 9        lt  lt  std  bitset lt 8 gt  std  rotl i 9    lt  lt    n       std  cout  lt  lt   rotl i -1       lt  lt  std  bitset lt 8 gt  std  rotl i -1    lt  lt    n       giving output   i            00011101 rotl i 0     00011101 rotl i 1     00111010 rotl i 4     11010001 rotl i 9     00111010 rotl i -1    10001110   I ll give it a try when support arrives to GCC  GCC 9 1 0 with g  -9 -std c  2a still doesn t support it   The proposal says      Header    namespace std        25 5 5  rotating      template lt class T gt        nodiscard   constexpr T rotl T x  int s  noexcept    template lt class T gt        nodiscard   constexpr T rotr T x  int s  noexcept     and      25 5 5 Rotating  bitops rot       In the following descriptions  let N denote std  numeric limits lt T gt   digits   template lt class T gt      nodiscard   constexpr T rotl T x  int s  noexcept        Constraints  T is an unsigned integer type  3 9 1  basic fundamental         Let r be s   N       Returns  If r is 0  x  if r is positive   x  lt  lt  r     x  gt  gt   N - r    if r is negative  rotr x  -r    template lt class T gt      nodiscard   constexpr T rotr T x  int s  noexcept        Constraints  T is an unsigned integer type  3 9 1  basic fundamental      Let r be s   N       Returns  If r is 0  x  if r is positive   x  gt  gt  r     x  lt  lt   N - r    if r is negative  rotl x  -r     A std  popcount was also added to count the number of 1 bits  How to count the number of set bits in a 32-bit integer

User · Answer

In details you can apply the following logic   If Bit Pattern is 33602 in Integer   1000 0011 0100 0010   and you need to Roll over with 2 right shifs then  first make a copy of bit pattern and then left shift it  Length - RightShift i e  length is 16 right shift value is 2 16 - 2   14  After 14 times left shifting you get    1000 0000 0000 0000   Now right shift the value 33602  2 times as required  You get   0010 0000 1101 0000   Now take an OR between 14 time left shifted value and 2 times right shifted value    1000 0000 0000 0000 0010 0000 1101 0000                     1010 0000 1101 0000                       And you get your shifted rollover value  Remember bit wise operations are faster and this don t even required any loop

User · Answer

Source Code  x bit number  int x  8  data  15    input unsigned char tmp  for int i  0 i lt x i      printf  Data  amp  1     d n  data amp 1   printf  Data Shifted value  d n  data gt  gt 1  data amp 1  lt  lt  x-1    tmp   data gt  gt 1  data amp 1  lt  lt  x-1   data   tmp

User · Answer

Definitively   template lt class T gt  T ror T x  unsigned int moves      return  x  gt  gt  moves     x  lt  lt  sizeof T  8 - moves

User · Answer

Since it s C    use an inline function   template  lt typename INT gt   INT rol INT val        return  val  lt  lt  1     val  gt  gt   sizeof INT  CHAR BIT-1        C  11 variant   template  lt typename INT gt   constexpr INT rol INT val        static assert std  is unsigned lt INT gt   value                     Rotate Left only makes sense for unsigned types        return  val  lt  lt  1     val  gt  gt   sizeof INT  CHAR BIT-1

User · Answer

The correct answer is following    define BitsCount  val     sizeof  val     CHAR BIT    define Shift  val  steps     steps   BitsCount  val      define ROL  val  steps       val  lt  lt  Shift  val  steps         val  gt  gt    BitsCount  val   - Shift  val  steps          define ROR  val  steps       val  gt  gt  Shift  val  steps         val  lt  lt    BitsCount  val   - Shift  val  steps

User · Answer

--- Substituting RLC in 8051 C for speed --- Rotate left carry Here is an example using RLC to update a serial 8 bit DAC msb first                                  r DACVAL  P1 4  SDO  P1 5  SCLK  MOV     A  r  1  MOV     B   8 RLC     A MOV     P1 4  C CLR     P1 5 SETB    P1 5 DJNZ    B   1  Here is the code in 8051 C at its fastest  sbit ACC 7    ACC   7     define this at the top to access bit 7 of ACC ACC         r  B           8    do    P1 4        ACC 7      this assembles into mov c  acc 7  mov P1 4  c  ACC      lt  lt   1  P1 5        0  P1 5        1  B       --           while   B  0    The keil compiler will use DJNZ when a loop is written this way  I am cheating here by using registers ACC and B in c code  If you cannot cheat then substitute with  P1 4          r  amp  128     1   0   r      lt  lt     1  This only takes a few extra instructions  Also  changing B for a local var char n is the same  Keil does rotate ACC left by ADD A  ACC which is the same as multiply 2  It only takes one extra opcode i think  Keeping code entirely in C keeps things simpler sometimes

User · Answer

Assuming you want to shift right by L bits  and the input x is a number with N bits   unsigned ror unsigned x  int L  int N         unsigned lsbs   x  amp    1  lt  lt  L  - 1       return  x  gt  gt  L     lsbs  lt  lt   N-L

User · Answer

Most compilers have intrinsics for that  Visual Studio for example  rotr8   rotr16

User · Answer

If x is an 8 bit value  you can use this   x  x gt  gt 1   x lt  lt 7

User · Answer

See also an earlier version of this answer on another rotate question with some more details about what asm gcc clang produce for x86   The most compiler-friendly way to express a rotate in C and C   that avoids any Undefined Behaviour seems to be John Regehr s implementation   I ve adapted it to rotate by the width of the type  using fixed-width types like uint32 t     include  lt stdint h gt       for uint32 t  include  lt limits h gt       for CHAR BIT     define NDEBUG  include  lt assert h gt   static inline uint32 t rotl32  uint32 t n  unsigned int c      const unsigned int mask    CHAR BIT sizeof n  - 1       assumes width is a power of 2        assert    c lt  mask   amp  amp  rotate by type width or more      c  amp   mask    return  n lt  lt c     n gt  gt    -c  amp mask        static inline uint32 t rotr32  uint32 t n  unsigned int c      const unsigned int mask    CHAR BIT sizeof n  - 1         assert    c lt  mask   amp  amp  rotate by type width or more      c  amp   mask    return  n gt  gt c     n lt  lt    -c  amp mask         Works for any unsigned integer type  not just uint32 t  so you could make versions for other sizes   See also a C  11 template version with lots of safety checks  including a static assert that the type width is a power of 2   which isn t the case on some 24-bit DSPs or 36-bit mainframes  for example   I d recommend only using the template as a back-end for wrappers with names that include the rotate width explicitly   Integer-promotion rules mean that rotl template u16  amp  0x11UL  7  would do a 32 or 64-bit rotate  not 16  depending on the width of unsigned long    Even uint16 t  amp  uint16 t is promoted to signed int by C   s integer-promotion rules  except on platforms where int is no wider than uint16 t     On x86  this version inlines to a single rol r32  cl  or rol r32  imm8  with compilers that grok it  because the compiler knows that x86 rotate and shift instructions mask the shift-count the same way the C source does   Compiler support for this UB-avoiding idiom on x86  for uint32 t x and unsigned int n for variable-count shifts    clang  recognized for variable-count rotates since clang3 5  multiple shifts or insns before that  gcc  recognized  for variable-count rotates since gcc4 9  multiple shifts or insns before that   gcc5 and later optimize away the branch and mask in the wikipedia version  too  using just a ror or rol instruction for variable counts  icc  supported for variable-count rotates since ICC13 or earlier   Constant-count rotates use shld edi edi 7 which is slower and takes more bytes than rol edi 7 on some CPUs  especially AMD  but also some Intel   when BMI2 isn t available for rorx eax edi 25 to save a MOV  MSVC  x86-64 CL19  Only recognized for constant-count rotates    The wikipedia idiom is recognized  but the branch and AND aren t optimized away    Use the  rotl    rotr intrinsics from  lt intrin h gt  on x86  including x86-64     gcc for ARM uses an and r1  r1   31 for variable-count rotates  but still does the actual rotate with a single instruction  ror r0  r0  r1   So gcc doesn t realize that rotate-counts are inherently modular   As the ARM docs say   ROR with shift length  n  more than 32 is the same as ROR with shift length n-32    I think gcc gets confused here because left right shifts on ARM saturate the count  so a shift by 32 or more will clear the register    Unlike x86  where shifts mask the count the same as rotates    It probably decides it needs an AND instruction before recognizing the rotate idiom  because of how non-circular shifts work on that target   Current x86 compilers still use an extra instruction to mask a variable count for 8 and 16-bit rotates  probably for the same reason they don t avoid the AND on ARM   This is a missed optimization  because performance doesn t depend on the rotate count on any x86-64 CPU    Masking of counts was introduced with 286 for performance reasons because it handled shifts iteratively  not with constant-latency like modern CPUs    BTW  prefer rotate-right for variable-count rotates  to avoid making the compiler do 32-n to implement a left rotate on architectures like ARM and MIPS that only provide a rotate-right    This optimizes away with compile-time-constant counts    Fun fact  ARM doesn t really have dedicated shift rotate instructions  it s just MOV with the source operand going through the barrel-shifter in ROR mode  mov r0  r0  ror r1   So a rotate can fold into a register-source operand for an EOR instruction or something     Make sure you use unsigned types for n and the return value  or else it won t be a rotate    gcc for x86 targets does arithmetic right shifts  shifting in copies of the sign-bit rather than zeroes  leading to a problem when you OR the two shifted values together   Right-shifts of negative signed integers is implementation-defined behaviour in C    Also  make sure the shift count is an unsigned type  because  -n  amp 31 with a signed type could be one s complement or sign magnitude  and not the same as the modular 2 n you get with unsigned or two s complement    See comments on Regehr s blog post    unsigned int does well on every compiler I ve looked at  for every width of x   Some other types actually defeat the idiom-recognition for some compilers  so don t just use the same type as x     Some compilers provide intrinsics for rotates  which is far better than inline-asm if the portable version doesn t generate good code on the compiler you re targeting   There aren t cross-platform intrinsics for any compilers that I know of   These are some of the x86 options    Intel documents that  lt immintrin h gt  provides  rotl and  rotl64 intrinsics  and same for right shift   MSVC requires  lt intrin h gt   while gcc require  lt x86intrin h gt    An  ifdef takes care of gcc vs  icc  but clang doesn t seem to provide them anywhere  except in MSVC compatibility mode with -fms-extensions -fms-compatibility -fms-compatibility-version 17 00   And the asm it emits for them sucks  extra masking and a CMOV   MSVC   rotr8 and  rotr16  gcc and icc  not clang     lt x86intrin h gt  also provides   rolb   rorb for 8-bit rotate left right    rolw   rorw  16-bit      rold   rord  32-bit     rolq   rorq  64-bit  only defined for 64-bit targets    For narrow rotates  the implementation uses   builtin ia32 rolhi or    qi  but the 32 and 64-bit rotates are defined using shift or  with no protection against UB  because the code in ia32intrin h only has to work on gcc for x86    GNU C appears not to have any cross-platform   builtin rotate functions the way it does for   builtin popcount  which expands to whatever s optimal on the target platform  even if it s not a single instruction    Most of the time you get good code from idiom-recognition         For real use  probably use a rotate intrinsic for MSVC  or this idiom for other compilers   This pattern of  ifdefs may be helpful  if defined   x86 64       defined   i386      ifdef  MSC VER  include  lt intrin h gt   else  include  lt x86intrin h gt      Not just  lt immintrin h gt  for compilers other than icc  endif  uint32 t rotl32 x86 intrinsic rotwidth t x  unsigned n        return   builtin ia32 rorhi x  7       16-bit rotate  GNU C   return  rotl x  n       gcc  icc  msvc   Intel-defined      return   rold x  n       gcc  icc       can t find anything for clang    endif     Presumably some non-x86 compilers have intrinsics  too  but let s not expand this community-wiki answer to include them all    Maybe do that in the existing answer about intrinsics       The old version of this answer suggested MSVC-specific inline asm  which only works for 32bit x86 code   or http   www devx com tips Tip 14043 for a C version   The comments are replying to that    Inline asm defeats many optimizations  especially MSVC-style because it forces inputs to be stored reloaded   A carefully-written GNU C inline-asm rotate would allow the count to be an immediate operand for compile-time-constant shift counts  but it still couldn t optimize away entirely if the value to be shifted is also a compile-time constant after inlining   https   gcc gnu org wiki DontUseInlineAsm

User · Answer

Overload a function   unsigned int rotate right unsigned int x     return  x gt  gt 1    x amp 1 0x80000000 0      unsigned short rotate right unsigned short x       etc

User · Answer

Below is a slightly improved version of D  dac P  rez s answer  with both directions implemented  along with a demo of these functions  usages using unsigned char and unsigned long long values  Several notes    The functions are inlined for compiler optimizations I used a cout  lt  lt   value trick for tersely outputting an unsigned char numerically that I found here  https   stackoverflow com a 28414758 1599699 I recommend using the explicit  lt put the type here gt  syntax for clarity and safety  I used unsigned char for the shiftNum parameter because of what I found in the Additional Details section here       The result of a shift operation is undefined if additive-expression is   negative or if additive-expression is greater than or equal to the   number of bits in the  promoted  shift-expression    Here s the code I m using    include  lt iostream gt   using namespace std   template  lt typename T gt  inline T rotateAndCarryLeft T rotateMe  unsigned char shiftNum        static const unsigned char TBitCount   sizeof T    8U       return  rotateMe  lt  lt  shiftNum     rotateMe  gt  gt   TBitCount - shiftNum       template  lt typename T gt  inline T rotateAndCarryRight T rotateMe  unsigned char shiftNum        static const unsigned char TBitCount   sizeof T    8U       return  rotateMe  gt  gt  shiftNum     rotateMe  lt  lt   TBitCount - shiftNum       void main           00010100     unsigned char 20U       00000101     unsigned char 5U    rotateAndCarryLeft 20U  6U        01010000     unsigned char 80U    rotateAndCarryRight 20U  6U       cout  lt  lt   unsigned char    lt  lt  20U  lt  lt    rotated left by 6 bits       lt  lt   rotateAndCarryLeft lt unsigned char gt  20U  6U   lt  lt    n       cout  lt  lt   unsigned char    lt  lt  20U  lt  lt    rotated right by 6 bits       lt  lt   rotateAndCarryRight lt unsigned char gt  20U  6U   lt  lt    n        cout  lt  lt    n         for  unsigned char shiftNum   0U  shiftNum  lt   sizeof unsigned char    8U    shiftNum                cout  lt  lt   unsigned char    lt  lt  21U  lt  lt    rotated left by    lt  lt   shiftNum  lt  lt    bit s        lt  lt   rotateAndCarryLeft lt unsigned char gt  21U  shiftNum   lt  lt    n              cout  lt  lt    n        for  unsigned char shiftNum   0U  shiftNum  lt   sizeof unsigned char    8U    shiftNum                cout  lt  lt   unsigned char    lt  lt  21U  lt  lt    rotated right by    lt  lt   shiftNum  lt  lt    bit s        lt  lt   rotateAndCarryRight lt unsigned char gt  21U  shiftNum   lt  lt    n               cout  lt  lt    n        for  unsigned char shiftNum   0U  shiftNum  lt   sizeof unsigned long long    8U    shiftNum                cout  lt  lt   unsigned long long    lt  lt  3457347ULL  lt  lt    rotated left by    lt  lt   shiftNum  lt  lt    bit s        lt  lt  rotateAndCarryLeft lt unsigned long long gt  3457347ULL  shiftNum   lt  lt    n              cout  lt  lt    n        for  unsigned char shiftNum   0U  shiftNum  lt   sizeof unsigned long long    8U    shiftNum                cout  lt  lt   unsigned long long    lt  lt  3457347ULL  lt  lt    rotated right by    lt  lt   shiftNum  lt  lt    bit s        lt  lt  rotateAndCarryRight lt unsigned long long gt  3457347ULL  shiftNum   lt  lt    n              cout  lt  lt    n n       system  pause

User · Answer

How abt something like this  using the standard bitset       include  lt bitset gt    include  lt iostream gt    template  lt std  size t N gt   inline void  rotate std  bitset lt N gt  amp  b  unsigned m         b   b  lt  lt  m   b  gt  gt   N-m        int main          std  bitset lt 8 gt  b 15       std  cout  lt  lt  b  lt  lt    n       rotate b  2       std  cout  lt  lt  b  lt  lt    n        return 0      HTH

User · Answer

another suggestion  template lt class T gt  inline T rotl T x  unsigned char moves       unsigned char temp        asm          mov temp  CL         mov CL  moves         rol x  CL         mov CL  temp            return x

User · Answer

define ROTATE RIGHT x     x gt  gt 1     x amp 1 0x8000 0

[c++] Best practices for circular shift (rotate) operations in C++

Examples related to c++

Examples related to c

Examples related to rotation

Examples related to bit-manipulation

Examples related to c++-faq