What is the strict aliasing rule

Question

When asking about common undefined behavior in C  people sometimes refer to the strict aliasing rule  What are they talking about

User · Accepted Answer

A typical situation where you encounter strict aliasing problems is when overlaying a struct (like a device/network msg) onto a buffer of the word size of your system (like a pointer to uint32_ts or uint16_ts). When you overlay a struct onto such a buffer, or a buffer onto such a struct through pointer casting you can easily violate strict aliasing rules.

So in this kind of setup, if I want to send a message to something I'd have to have two incompatible pointers pointing to the same chunk of memory. I might then naively code something like this:

typedef struct Msg
{
    unsigned int a;
    unsigned int b;
} Msg;

void SendWord(uint32_t);

int main(void)
{
    // Get a 32-bit buffer from the system
    uint32_t* buff = malloc(sizeof(Msg));
    
    // Alias that buffer through message
    Msg* msg = (Msg*)(buff);
    
    // Send a bunch of messages    
    for (int i = 0; i < 10; ++i)
    {
        msg->a = i;
        msg->b = i+1;
        SendWord(buff[0]);
        SendWord(buff[1]);   
    }
}

The strict aliasing rule makes this setup illegal: dereferencing a pointer that aliases an object that is not of a compatible type or one of the other types allowed by C 2011 6.5 paragraph 7¹ is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.

(GCC appears somewhat inconsistent in its ability to give aliasing warnings, sometimes giving us a friendly warning and sometimes not.)

To see why this behavior is undefined, we have to think about what the strict aliasing rule buys the compiler. Basically, with this rule, it doesn't have to think about inserting instructions to refresh the contents of buff every run of the loop. Instead, when optimizing, with some annoyingly unenforced assumptions about aliasing, it can omit those instructions, load buff[0] and buff[1] into CPU registers once before the loop is run, and speed up the body of the loop. Before strict aliasing was introduced, the compiler had to live in a state of paranoia that the contents of buff could change by any preceding memory stores. So to get an extra performance edge, and assuming most people don't type-pun pointers, the strict aliasing rule was introduced.

Keep in mind, if you think the example is contrived, this might even happen if you're passing a buffer to another function doing the sending for you, if instead you have.

void SendMessage(uint32_t* buff, size_t size32)
{
    for (int i = 0; i < size32; ++i) 
    {
        SendWord(buff[i]);
    }
}

And rewrote our earlier loop to take advantage of this convenient function

for (int i = 0; i < 10; ++i)
{
    msg->a = i;
    msg->b = i+1;
    SendMessage(buff, 2);
}

The compiler may or may not be able to or smart enough to try to inline SendMessage and it may or may not decide to load or not load buff again. If SendMessage is part of another API that's compiled separately, it probably has instructions to load buff's contents. Then again, maybe you're in C++ and this is some templated header only implementation that the compiler thinks it can inline. Or maybe it's just something you wrote in your .c file for your own convenience. Anyway undefined behavior might still ensue. Even when we know some of what's happening under the hood, it's still a violation of the rule so no well defined behavior is guaranteed. So just by wrapping in a function that takes our word delimited buffer doesn't necessarily help.

So how do I get around this?

Use a union. Most compilers support this without complaining about strict aliasing. This is allowed in C99 and explicitly allowed in C11.
```
  union {
      Msg msg;
      unsigned int asBuffer[sizeof(Msg)/sizeof(unsigned int)];
  };
```
You can disable strict aliasing in your compiler (f[no-]strict-aliasing in gcc))
You can use char* for aliasing instead of your system's word. The rules allow an exception for char* (including signed char and unsigned char). It's always assumed that char* aliases other types. However this won't work the other way: there's no assumption that your struct aliases a buffer of chars.

Beginner beware

This is only one potential minefield when overlaying two types onto each other. You should also learn about endianness, word alignment, and how to deal with alignment issues through packing structs correctly.

Footnote

¹ The types that C 2011 6.5 7 allows an lvalue to access are:

a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.

User · Answer

Strict aliasing is not allowing different pointer types to the same data   This article should help you understand the issue in full detail

User · Answer

Type punning via pointer casts  as opposed to using a union  is a major example of breaking strict aliasing

User · Answer

Type punning via pointer casts  as opposed to using a union  is a major example of breaking strict aliasing

User · Answer

The best explanation I have found is by Mike Acton  Understanding Strict Aliasing  It s focused a little on PS3 development  but that s basically just GCC   From the article       Strict aliasing is an assumption  made by the C  or C    compiler  that dereferencing pointers to objects of different types will never refer to the same memory location  i e  alias each other      So basically if you have an int  pointing to some memory containing an int and then you point a float  to that memory and use it as a float you break the rule  If your code does not respect this  then the compiler s optimizer will most likely break your code   The exception to the rule is a char   which is allowed to point to any type

User · Answer

Type punning via pointer casts  as opposed to using a union  is a major example of breaking strict aliasing

User · Answer

This is the strict aliasing rule  found in section 3 10 of the C  03 standard  other answers provide good explanation  but none provided the rule itself    If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined   the dynamic type of the object  a cv-qualified version of the dynamic type of the object  a type that is the signed or unsigned type corresponding to the dynamic type of the object  a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object  an aggregate or union type that includes one of the aforementioned types among its members  including  recursively  a member of a subaggregate or contained union   a type that is a  possibly cv-qualified  base class type of the dynamic type of the object  a char or unsigned char type    C  11 and C  14 wording  changes emphasized    If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined   the dynamic type of the object  a cv-qualified version of the dynamic type of the object  a type similar  as defined in 4 4  to the dynamic type of the object  a type that is the signed or unsigned type corresponding to the dynamic type of the object  a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object  an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members  including  recursively  an element or non-static data member of a subaggregate or contained union   a type that is a  possibly cv-qualified  base class type of the dynamic type of the object  a char or unsigned char type    Two changes were small  glvalue instead of lvalue  and clarification of the aggregate union case  The third change makes a stronger guarantee  relaxes the strong aliasing rule   The new concept of similar types that are now safe to alias   Also the C wording  C99  ISO IEC 9899 1999 6 5 7  the exact same wording is used in ISO IEC 9899 2011   6 5   7    An object shall have its stored value accessed only by an lvalue expression that has one of the following types  73  or 88    a type compatible with the effective type of the object  a quali ed version of a type compatible with the effective type of the object  a type that is the signed or unsigned type corresponding to the effective type of the object  a type that is the signed or unsigned type corresponding to a quali ed version of the effective type of the object  an aggregate or union type that includes one of the aforementioned types among its members  including  recursively  a member of a subaggregate or contained union   or a character type    73  or 88  The intent of this list is to specify those circumstances in which an object may or may not be aliased

User · Answer

Strict aliasing is not allowing different pointer types to the same data   This article should help you understand the issue in full detail

User · Answer

This is the strict aliasing rule  found in section 3 10 of the C  03 standard  other answers provide good explanation  but none provided the rule itself    If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined   the dynamic type of the object  a cv-qualified version of the dynamic type of the object  a type that is the signed or unsigned type corresponding to the dynamic type of the object  a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object  an aggregate or union type that includes one of the aforementioned types among its members  including  recursively  a member of a subaggregate or contained union   a type that is a  possibly cv-qualified  base class type of the dynamic type of the object  a char or unsigned char type    C  11 and C  14 wording  changes emphasized    If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined   the dynamic type of the object  a cv-qualified version of the dynamic type of the object  a type similar  as defined in 4 4  to the dynamic type of the object  a type that is the signed or unsigned type corresponding to the dynamic type of the object  a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object  an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members  including  recursively  an element or non-static data member of a subaggregate or contained union   a type that is a  possibly cv-qualified  base class type of the dynamic type of the object  a char or unsigned char type    Two changes were small  glvalue instead of lvalue  and clarification of the aggregate union case  The third change makes a stronger guarantee  relaxes the strong aliasing rule   The new concept of similar types that are now safe to alias   Also the C wording  C99  ISO IEC 9899 1999 6 5 7  the exact same wording is used in ISO IEC 9899 2011   6 5   7    An object shall have its stored value accessed only by an lvalue expression that has one of the following types  73  or 88    a type compatible with the effective type of the object  a quali ed version of a type compatible with the effective type of the object  a type that is the signed or unsigned type corresponding to the effective type of the object  a type that is the signed or unsigned type corresponding to a quali ed version of the effective type of the object  an aggregate or union type that includes one of the aforementioned types among its members  including  recursively  a member of a subaggregate or contained union   or a character type    73  or 88  The intent of this list is to specify those circumstances in which an object may or may not be aliased

User · Answer

Note  This is excerpted from my  What is the Strict Aliasing Rule and Why do we care   write-up   What is strict aliasing   In C and C   aliasing has to do with what expression types we are allowed to access stored values through  In both C and C   the standard specifies which expression types are allowed to alias which types  The compiler and optimizer are allowed to assume we follow the aliasing rules strictly  hence the term strict aliasing rule  If we attempt to access a value using a type not allowed it is classified as undefined behavior UB   Once we have undefined behavior all bets are off  the results of our program are no longer reliable   Unfortunately with strict aliasing violations  we will often obtain the results we expect  leaving the possibility the a future version of a compiler with a new optimization will break code we thought was valid  This is undesirable and it is a worthwhile goal to understand the strict aliasing rules and how to avoid violating them    To understand more about why we care  we will discuss issues that come up when violating strict aliasing rules  type punning since common techniques used in type punning often violate strict aliasing rules and how to type pun correctly   Preliminary examples  Let s look at some examples  then we can talk about exactly what the standard s  say  examine some further examples and then see how to avoid strict aliasing and catch violations we missed  Here is an example that should not be surprising  live example    int x   10  int  ip    amp x   std  cout  lt  lt   ip  lt  lt    n    ip   12  std  cout  lt  lt  x  lt  lt    n     We have a int  pointing to memory occupied by an int and this is a valid aliasing  The optimizer must assume that assignments through ip could update the value occupied by x   The next example shows aliasing that leads to undefined behavior  live example    int foo  float  f  int  i           i   1                      f   0 f                  return  i     int main         int x   0       std  cout  lt  lt  x  lt  lt    n        Expect 0     x   foo reinterpret cast lt float  gt   amp x    amp x       std  cout  lt  lt  x  lt  lt    n        Expect 0      In the function foo we take an int  and a float   in this example we call foo and set both parameters to point to the same memory location which in this example contains an int  Note  the reinterpret cast is telling the compiler to treat the the expression as if it had the type specificed by its template parameter  In this case we are telling it to treat the expression  amp x as if it had type float   We may naively expect the result of the second cout to be 0 but with optimization enabled using -O2 both gcc and clang produce the following result   0 1   Which may not be expected but is perfectly valid since we have invoked undefined behavior  A float can not validly alias an int object  Therefore the optimizer can assume the constant 1 stored when dereferencing i will be the return value since a store through f could not validly affect an int object  Plugging the code in Compiler Explorer shows this is exactly what is happening live example    foo float   int       foo float   int   mov dword ptr  rsi   1   mov dword ptr  rdi   0 mov eax  1                        ret   The optimizer using Type-Based Alias Analysis  TBAA  assumes 1 will be returned and directly moves the constant value into register eax which carries the return value  TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores  In this case TBAA knows that a float can not alias and int and optimizes away the load of i   Now  to the Rule-Book  What exactly does the standard say we are allowed and not allowed to do  The standard language is not straightforward  so for each item I will try to provide code examples that demonstrates the meaning    What does the C11 standard say   The C11 standard says the following in section 6 5 Expressions paragraph 7      An object shall have its stored value accessed only by an lvalue expression that has one of the following types 88        a type compatible with the effective type of the object    int x   1  int  p    amp x     printf   d n    p       p gives us an lvalue expression of type int which is compatible with int          a qualified version of a type compatible with the effective type of the object    int x   1  const int  p    amp x  printf   d n    p       p gives us an lvalue expression of type const int which is compatible with int          a type that is the signed or unsigned type corresponding to the effective type of the object    int x   1  unsigned int  p    unsigned int   amp x  printf   u n    p        p gives us an lvalue expression of type unsigned int which corresponds to                          the effective type of the object   gcc clang has an extension and also that allows assigning unsigned int  to int  even though they are not compatible types          a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object    int x   1  const unsigned int  p    const unsigned int   amp x  printf   u n    p        p gives us an lvalue expression of type const unsigned int which is a unsigned type                          that corresponds with to a qualified verison of the effective type of the object          an aggregate or union type that includes one of the aforementioned types among its members  including  recursively  a member of a subaggregate or contained union   or   struct foo     int x      void foobar  struct foo  fp  int  ip        struct foo is an aggregate that includes int among its members so it can                                             can alias with  ip  foo f  foobar   amp f   amp f x             a character type    int x   65  char  p    char    amp x  printf   c n    p         p gives us an lvalue expression of type char which is a character type                           The results are not portable due to endianness issues    What the C  17 Draft Standard say  The C  17 draft standard in section  basic lval  paragraph 11 says      If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined 63    11 1      the dynamic type of the object    void  p   malloc  sizeof int        We have allocated storage but not started the lifetime of an object int  ip   new  p  int 0             Placement new changes the dynamic type of the object to int std  cout  lt  lt   ip  lt  lt    n              ip gives us a glvalue expression of type int which matches the dynamic type                                       of the allocated object       11 2      a cv-qualified version of the dynamic type of the object    int x   1  const int  cip    amp x  std  cout  lt  lt   cip  lt  lt    n        cip gives us a glvalue expression of type const int which is a cv-qualified                                 version of the dynamic type of x       11 3      a type similar  as defined in 7 5  to the dynamic type of the object        11 4      a type that is the signed or unsigned type corresponding to the dynamic type of the object       Both si and ui are signed or unsigned types corresponding to each others dynamic types    We can see from this godbolt https   godbolt org g KowGXB  the optimizer assumes aliasing  signed int foo  signed int  amp si  unsigned int  amp ui       si   1    ui   2     return si          11 5      a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object    signed int foo  const signed int  amp si1  int  amp si2      Hard to show this one assumes aliasing       11 6      an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members  including  recursively  an element or non-static data member of a subaggregate or contained union     struct foo    int x         Compiler Explorer example https   godbolt org g z2wJTC  shows aliasing assumption int foobar  foo  amp fp  int  amp ip      fp x   1   ip   2    return fp x     foo f   foobar  f  f x           11 7      a type that is a  possibly cv-qualified  base class type of the dynamic type of the object    struct foo   int x       struct bar   public foo      int foobar  foo  amp f  bar  amp b       f x   1    b x   2     return f x          11 8      a char  unsigned char  or std  byte type    int foo  std  byte  amp b  uint32 t  amp ui       b   static cast lt std  byte gt   a      ui   0xFFFFFFFF                        return std  to integer lt int gt   b        b gives us a glvalue expression of type std  byte which can alias                                         an object of type uint32 t     Worth noting signed char is not included in the list above  this is a notable difference from C which says a character type   What is Type Punning  We have gotten to this point and we may be wondering  why would we want to alias for  The answer typically is to type pun  often the methods used violate strict aliasing rules   Sometimes we want to circumvent the type system and interpret an object as a different type  This is called type punning  to reinterpret a segment of memory as another type  Type punning is useful for tasks that want access to the underlying representation of an object to view  transport or manipulate  Typical areas we find type punning being used are compilers  serialization  networking code  etc      Traditionally this has been accomplished by taking the address of the object  casting it to a pointer of the type we want to reinterpret it as and then accessing the value  or in other words by aliasing  For example   int x    1       In C float  fp    float   amp x       Not a valid aliasing     In C   float  fp   reinterpret cast lt float  gt   amp x        Not a valid aliasing  printf    f n    fp       As we have seen earlier this is not a valid aliasing  so we are invoking undefined behavior  But traditionally compilers did not take advantage of strict aliasing rules and this type of code usually just worked  developers have unfortunately gotten used to doing things this way  A common alternate method for type punning is through unions  which is valid in C but undefined behavior in C    see live example     union u1     int n    float f       union u1 u  u f   1 0f   printf    d n     u n        UB in C   n is not the active member   This is not valid in C   and some consider the purpose of unions to be solely for implementing variant types and feel using unions for type punning is an abuse   How do we Type Pun correctly   The standard method for type punning in both C and C   is memcpy  This may seem a little heavy handed but the optimizer should recognize the use of memcpy for type punning and optimize it away and generate a register to register move  For example if we know int64 t is the same size as double   static assert  sizeof  double      sizeof  int64 t          C  17 does not require a message   we can use memcpy   void func1  double d       std  int64 t n    std  memcpy  amp n   amp d  sizeof d              At a sufficient optimization level any decent modern compiler generates identical code to the previously mentioned reinterpret cast method or union method for type punning  Examining the generated code we see it uses just register mov  live Compiler Explorer Example    C  20 and bit cast  In C  20 we may gain bit cast  implementation available in link from proposal  which gives a simple and safe way to type-pun as well as being usable in a constexpr context   The following is an example of how to use bit cast to type pun a unsigned int to float   see it live    std  cout  lt  lt  bit cast lt float gt  0x447a0000   lt  lt    n      assuming sizeof float     sizeof unsigned int    In the case where To and From types don t have the same size  it requires us to use an intermediate struct15  We will use a struct containing a sizeof  unsigned int   character array  assumes 4 byte unsigned int  to be the From type and unsigned int as the To type    struct uint chars    unsigned char arr sizeof  unsigned int               Assume sizeof  unsigned int      4        Assume len is a multiple of 4  int bar  unsigned char  p  size t len      int result   0    for  size t index   0  index  lt  len  index    sizeof unsigned int         uint chars f     std  memcpy  f arr   amp p index   sizeof unsigned int       unsigned int result   bit cast lt unsigned int gt  f       result    foo  result         return result       It is unfortunate that we need this intermediate type but that is the current constraint of bit cast   Catching Strict Aliasing Violations  We don t have a lot of good tools for catching strict aliasing in C    the tools we have will catch some cases of strict aliasing violations and some cases of misaligned loads and stores    gcc using the flag -fstrict-aliasing and -Wstrict-aliasing can catch some cases although not without false positives negatives  For example the following cases will generate a warning in gcc  see it live    int a   1  short j  float f   1 f     Originally not initialized but tis-kernel caught                    it was being accessed w  an indeterminate value below  printf   i n   j     reinterpret cast lt short  gt   amp a     printf   i n   j     reinterpret cast lt int  gt   amp f       although it will not catch this additional case  see it live    int  p   p  amp a  printf   i n   j     reinterpret cast lt short  gt  p       Although clang allows these flags it apparently does not actually implement the warnings   Another tool we have available to us is ASan which can catch misaligned loads and stores  Although these are not directly strict aliasing violations they are a common result of strict aliasing violations  For example the following cases will generate runtime errors when built with clang using -fsanitize address  int  x   new int 2                    8 bytes   0 7   int  u    int    char  x   6          regardless of alignment of x this will not be an aligned address  u   1                                Access to range  6-9  printf    d n    u                    Access to range  6-9    The last tool I will recommend is C   specific and not strictly a tool but a coding practice  don t allow C-style casts  Both gcc and clang will produce a diagnostic for C-style casts using -Wold-style-cast  This will force any undefined type puns to use reinterpret cast  in general reinterpret cast should be a flag for closer code review  It is also easier to search your code base for reinterpret cast to perform an audit   For C we have all the tools already covered and we also have tis-interpreter  a static analyzer that exhaustively analyzes a program for a large subset of the C language  Given a C verions of the  earlier example where using -fstrict-aliasing misses one case  see it live   int a   1  short j  float f   1 0    printf   i n   j      short   amp a    printf   i n   j      int   amp f     int  p    p  amp a  printf   i n   j      short  p      tis-interpeter is able to catch all three  the following example invokes tis-kernal as tis-interpreter  output is edited for brevity      bin tis-kernel -sa example1 c      example1 c 9  sa  warning  The pointer  short     amp  a  has type short    It violates strict aliasing               rules by accessing a cell with effective type int       example1 c 10  sa  warning  The pointer  int     amp  f  has type int    It violates strict aliasing rules by               accessing a cell with effective type float                Callstack  main      example1 c 15  sa  warning  The pointer  short   p has type short    It violates strict aliasing rules by               accessing a cell with effective type int    Finally there is TySan which is currently in development  This sanitizer adds type checking information in a shadow memory segment and checks accesses to see if they violate aliasing rules  The tool potentially should be able to catch all aliasing violations but may have a large run-time overhead

User · Answer

Technically in C    the strict aliasing rule is probably never applicable   Note the definition of indirection    operator       The unary   operator performs indirection  the expression to which it   is applied shall be a pointer to an object type  or a pointer to a   function type and the result is an lvalue referring to the object or   function to which the expression points    Also from the definition of glvalue     A glvalue is an expression whose evaluation determines the identity of   an object      snip    So in any well defined program trace  a glvalue refers to an object  So the so called strict aliasing rule doesn t apply  ever  This may not be what the designers wanted

User · Answer

The best explanation I have found is by Mike Acton  Understanding Strict Aliasing  It s focused a little on PS3 development  but that s basically just GCC   From the article       Strict aliasing is an assumption  made by the C  or C    compiler  that dereferencing pointers to objects of different types will never refer to the same memory location  i e  alias each other      So basically if you have an int  pointing to some memory containing an int and then you point a float  to that memory and use it as a float you break the rule  If your code does not respect this  then the compiler s optimizer will most likely break your code   The exception to the rule is a char   which is allowed to point to any type

User · Answer

After reading many of the answers  I feel the need to add something   Strict aliasing  which I ll describe in a bit  is important because    Memory access can be expensive  performance wise   which is why data is manipulated in CPU registers before being written back to the physical memory  If data in two different CPU registers will be written to the same memory space  we can t predict which data will  survive  when we code in C   In assembly  where we code the loading and unloading of CPU registers manually  we will know which data remains intact  But C  thankfully  abstracts this detail away    Since two pointers can point to the same location in the memory  this could result in complex code that handles possible collisions   This extra code is slow and hurts performance since it performs extra memory read   write operations which are both slower and  possibly  unnecessary   The Strict aliasing rule allows us to avoid redundant machine code in cases in which it should be safe to assume that two pointers don t point to the same memory block  see also the restrict keyword    The Strict aliasing states it s safe to assume that pointers to different types point to different locations in the memory   If a compiler notices that two pointers point to different types  for example  an int   and a float     it will assume the memory address is different and it will not protect against memory address collisions  resulting in faster machine code   For example   Lets assume the following function   void merge two ints int  a  int  b       b     a     a     b      In order to handle the case in which a    b  both pointers point to the same memory   we need to order and test the way we load data from the memory to the CPU registers  so the code might end up like this    load a and b from memory  add a to b  save b and reload a    save from CPU register to the memory and load from the memory to the CPU register    add b to a  save a  from the CPU register  to the memory    Step 3 is very slow because it needs to access the physical memory  However  it s required to protect against instances where a and b point to the same memory address   Strict aliasing would allow us to prevent this by telling the compiler that these memory addresses are distinctly different  which  in this case  will allow even further optimization which can t be performed if the pointers share a memory address     This can be told to the compiler in two ways  by using different types to point to  i e    void merge two numbers int  a  long  b         Using the restrict keyword  i e    void merge two ints int   restrict a  int   restrict b           Now  by satisfying the Strict Aliasing rule  step 3 can be avoided and the code will run significantly faster   In fact  by adding the restrict keyword  the whole function could be optimized to    load a and b from memory  add a to b  save result both to a and to b    This optimization couldn t have been done before  because of the possible collision  where a and b would be tripled instead of doubled

User · Answer

Type punning via pointer casts  as opposed to using a union  is a major example of breaking strict aliasing

User · Answer

As addendum to what Doug T  already wrote  here is a simple test case which probably triggers it with gcc    check c   include  lt stdio h gt   void check short  h long  k         h 5       k 6      if   h    5          printf  strict aliasing problem n       int main void        long      k 1       check  short   k k       return 0      Compile with gcc -O2 -o check check c   Usually  with most gcc versions I tried  this outputs  strict aliasing problem   because the compiler assumes that  h  cannot be the same address as  k  in the  check  function  Because of that the compiler optimizes the if   h    5  away and always calls the printf   For those who are interested here is the x64 assembler code  produced by gcc 4 6 3  running on ubuntu 12 04 2 for x64   movw     5    rdi  movq     6    rsi  movl      LC0   edi jmp puts   So the if condition is completely gone from the assembler code

User · Answer

Strict aliasing is not allowing different pointer types to the same data   This article should help you understand the issue in full detail

User · Answer

Strict aliasing doesn t refer only to pointers  it affects references as well  I wrote a paper about it for the boost developer wiki and it was so well received that I turned it into a page on my consulting web site   It explains completely what it is  why it confuses people so much and what to do about it  Strict Aliasing White Paper   In particular it explains why unions are risky behavior for C    and why using memcpy is the only fix portable across both C and C     Hope this is helpful

User · Answer

After reading many of the answers  I feel the need to add something   Strict aliasing  which I ll describe in a bit  is important because    Memory access can be expensive  performance wise   which is why data is manipulated in CPU registers before being written back to the physical memory  If data in two different CPU registers will be written to the same memory space  we can t predict which data will  survive  when we code in C   In assembly  where we code the loading and unloading of CPU registers manually  we will know which data remains intact  But C  thankfully  abstracts this detail away    Since two pointers can point to the same location in the memory  this could result in complex code that handles possible collisions   This extra code is slow and hurts performance since it performs extra memory read   write operations which are both slower and  possibly  unnecessary   The Strict aliasing rule allows us to avoid redundant machine code in cases in which it should be safe to assume that two pointers don t point to the same memory block  see also the restrict keyword    The Strict aliasing states it s safe to assume that pointers to different types point to different locations in the memory   If a compiler notices that two pointers point to different types  for example  an int   and a float     it will assume the memory address is different and it will not protect against memory address collisions  resulting in faster machine code   For example   Lets assume the following function   void merge two ints int  a  int  b       b     a     a     b      In order to handle the case in which a    b  both pointers point to the same memory   we need to order and test the way we load data from the memory to the CPU registers  so the code might end up like this    load a and b from memory  add a to b  save b and reload a    save from CPU register to the memory and load from the memory to the CPU register    add b to a  save a  from the CPU register  to the memory    Step 3 is very slow because it needs to access the physical memory  However  it s required to protect against instances where a and b point to the same memory address   Strict aliasing would allow us to prevent this by telling the compiler that these memory addresses are distinctly different  which  in this case  will allow even further optimization which can t be performed if the pointers share a memory address     This can be told to the compiler in two ways  by using different types to point to  i e    void merge two numbers int  a  long  b         Using the restrict keyword  i e    void merge two ints int   restrict a  int   restrict b           Now  by satisfying the Strict Aliasing rule  step 3 can be avoided and the code will run significantly faster   In fact  by adding the restrict keyword  the whole function could be optimized to    load a and b from memory  add a to b  save result both to a and to b    This optimization couldn t have been done before  because of the possible collision  where a and b would be tripled instead of doubled

User · Answer

The best explanation I have found is by Mike Acton  Understanding Strict Aliasing  It s focused a little on PS3 development  but that s basically just GCC   From the article       Strict aliasing is an assumption  made by the C  or C    compiler  that dereferencing pointers to objects of different types will never refer to the same memory location  i e  alias each other      So basically if you have an int  pointing to some memory containing an int and then you point a float  to that memory and use it as a float you break the rule  If your code does not respect this  then the compiler s optimizer will most likely break your code   The exception to the rule is a char   which is allowed to point to any type

User · Answer

Strict aliasing is not allowing different pointer types to the same data   This article should help you understand the issue in full detail

User · Answer

The best explanation I have found is by Mike Acton  Understanding Strict Aliasing  It s focused a little on PS3 development  but that s basically just GCC   From the article       Strict aliasing is an assumption  made by the C  or C    compiler  that dereferencing pointers to objects of different types will never refer to the same memory location  i e  alias each other      So basically if you have an int  pointing to some memory containing an int and then you point a float  to that memory and use it as a float you break the rule  If your code does not respect this  then the compiler s optimizer will most likely break your code   The exception to the rule is a char   which is allowed to point to any type

User · Answer

According to the C89 rationale  the authors of the Standard did not want to require that compilers given code like   int x  int test double  p      x 5     p   1 0    return x      should be required to reload the value of x between the assignment and return statement so as to allow for the possibility that p might point to x  and the assignment to  p might consequently alter the value of x   The notion that a compiler should be entitled to presume that there won t be aliasing in situations like the above was non-controversial   Unfortunately  the authors of the C89 wrote their rule in a way that  if read literally  would make even the following function invoke Undefined Behavior   void test void      struct S  int x   s    s x   1      because it uses an lvalue of type int to access an object of type struct S  and int is not among the types that may be used accessing a struct S   Because it would be absurd to treat all use of non-character-type members of structs and unions as Undefined Behavior  almost everyone recognizes that there are at least some circumstances where an lvalue of one type may be used to access an object of another type   Unfortunately  the C Standards Committee has failed to define what those circumstances are   Much of the problem is a result of Defect Report  028  which asked about the behavior of a program like   int test int  ip  double  dp       ip   1     dp   1 23    return  ip    int test2 void      union U   int i  double d    u    return test  amp u i   amp u d       Defect Report  28 states that the program invokes Undefined Behavior because the action of writing a union member of type  double  and reading one of type  int  invokes Implementation-Defined behavior   Such reasoning is nonsensical  but forms the basis for the Effective Type rules which needlessly complicate the language while doing nothing to address the original problem   The best way to resolve the original problem would probably be to treat the footnote about the purpose of the rule as though it were normative  and made the rule unenforceable except in cases which actually involve conflicting accesses using aliases   Given something like    void inc int int  p     p   3     int test void        int  p     struct S   int x    s     s x   1     p    amp s x     inc int p      return s x       There s no conflict within inc int because all accesses to the storage accessed through  p are done with an lvalue of type int  and there s no conflict in test because p is visibly derived from a struct S  and by the next time s is used  all accesses to that storage that will ever be made through p will have already happened   If the code were changed slightly      void inc int int  p     p   3     int test void        int  p     struct S   int x    s     p    amp s x     s x   1                 p    1     return s x       Here  there is an aliasing conflict between p and the access to s x on the marked line because at that point in execution another reference exists that will be used to access the same storage   Had Defect Report 028 said the original example invoked UB because of the overlap between the creation and use of the two pointers  that would have made things a lot more clear without having to add  Effective Types  or other such complexity

User · Answer

As addendum to what Doug T  already wrote  here is a simple test case which probably triggers it with gcc    check c   include  lt stdio h gt   void check short  h long  k         h 5       k 6      if   h    5          printf  strict aliasing problem n       int main void        long      k 1       check  short   k k       return 0      Compile with gcc -O2 -o check check c   Usually  with most gcc versions I tried  this outputs  strict aliasing problem   because the compiler assumes that  h  cannot be the same address as  k  in the  check  function  Because of that the compiler optimizes the if   h    5  away and always calls the printf   For those who are interested here is the x64 assembler code  produced by gcc 4 6 3  running on ubuntu 12 04 2 for x64   movw     5    rdi  movq     6    rsi  movl      LC0   edi jmp puts   So the if condition is completely gone from the assembler code

User · Answer

According to the C89 rationale  the authors of the Standard did not want to require that compilers given code like   int x  int test double  p      x 5     p   1 0    return x      should be required to reload the value of x between the assignment and return statement so as to allow for the possibility that p might point to x  and the assignment to  p might consequently alter the value of x   The notion that a compiler should be entitled to presume that there won t be aliasing in situations like the above was non-controversial   Unfortunately  the authors of the C89 wrote their rule in a way that  if read literally  would make even the following function invoke Undefined Behavior   void test void      struct S  int x   s    s x   1      because it uses an lvalue of type int to access an object of type struct S  and int is not among the types that may be used accessing a struct S   Because it would be absurd to treat all use of non-character-type members of structs and unions as Undefined Behavior  almost everyone recognizes that there are at least some circumstances where an lvalue of one type may be used to access an object of another type   Unfortunately  the C Standards Committee has failed to define what those circumstances are   Much of the problem is a result of Defect Report  028  which asked about the behavior of a program like   int test int  ip  double  dp       ip   1     dp   1 23    return  ip    int test2 void      union U   int i  double d    u    return test  amp u i   amp u d       Defect Report  28 states that the program invokes Undefined Behavior because the action of writing a union member of type  double  and reading one of type  int  invokes Implementation-Defined behavior   Such reasoning is nonsensical  but forms the basis for the Effective Type rules which needlessly complicate the language while doing nothing to address the original problem   The best way to resolve the original problem would probably be to treat the footnote about the purpose of the rule as though it were normative  and made the rule unenforceable except in cases which actually involve conflicting accesses using aliases   Given something like    void inc int int  p     p   3     int test void        int  p     struct S   int x    s     s x   1     p    amp s x     inc int p      return s x       There s no conflict within inc int because all accesses to the storage accessed through  p are done with an lvalue of type int  and there s no conflict in test because p is visibly derived from a struct S  and by the next time s is used  all accesses to that storage that will ever be made through p will have already happened   If the code were changed slightly      void inc int int  p     p   3     int test void        int  p     struct S   int x    s     p    amp s x     s x   1                 p    1     return s x       Here  there is an aliasing conflict between p and the access to s x on the marked line because at that point in execution another reference exists that will be used to access the same storage   Had Defect Report 028 said the original example invoked UB because of the overlap between the creation and use of the two pointers  that would have made things a lot more clear without having to add  Effective Types  or other such complexity

User · Answer

Note  This is excerpted from my  What is the Strict Aliasing Rule and Why do we care   write-up   What is strict aliasing   In C and C   aliasing has to do with what expression types we are allowed to access stored values through  In both C and C   the standard specifies which expression types are allowed to alias which types  The compiler and optimizer are allowed to assume we follow the aliasing rules strictly  hence the term strict aliasing rule  If we attempt to access a value using a type not allowed it is classified as undefined behavior UB   Once we have undefined behavior all bets are off  the results of our program are no longer reliable   Unfortunately with strict aliasing violations  we will often obtain the results we expect  leaving the possibility the a future version of a compiler with a new optimization will break code we thought was valid  This is undesirable and it is a worthwhile goal to understand the strict aliasing rules and how to avoid violating them    To understand more about why we care  we will discuss issues that come up when violating strict aliasing rules  type punning since common techniques used in type punning often violate strict aliasing rules and how to type pun correctly   Preliminary examples  Let s look at some examples  then we can talk about exactly what the standard s  say  examine some further examples and then see how to avoid strict aliasing and catch violations we missed  Here is an example that should not be surprising  live example    int x   10  int  ip    amp x   std  cout  lt  lt   ip  lt  lt    n    ip   12  std  cout  lt  lt  x  lt  lt    n     We have a int  pointing to memory occupied by an int and this is a valid aliasing  The optimizer must assume that assignments through ip could update the value occupied by x   The next example shows aliasing that leads to undefined behavior  live example    int foo  float  f  int  i           i   1                      f   0 f                  return  i     int main         int x   0       std  cout  lt  lt  x  lt  lt    n        Expect 0     x   foo reinterpret cast lt float  gt   amp x    amp x       std  cout  lt  lt  x  lt  lt    n        Expect 0      In the function foo we take an int  and a float   in this example we call foo and set both parameters to point to the same memory location which in this example contains an int  Note  the reinterpret cast is telling the compiler to treat the the expression as if it had the type specificed by its template parameter  In this case we are telling it to treat the expression  amp x as if it had type float   We may naively expect the result of the second cout to be 0 but with optimization enabled using -O2 both gcc and clang produce the following result   0 1   Which may not be expected but is perfectly valid since we have invoked undefined behavior  A float can not validly alias an int object  Therefore the optimizer can assume the constant 1 stored when dereferencing i will be the return value since a store through f could not validly affect an int object  Plugging the code in Compiler Explorer shows this is exactly what is happening live example    foo float   int       foo float   int   mov dword ptr  rsi   1   mov dword ptr  rdi   0 mov eax  1                        ret   The optimizer using Type-Based Alias Analysis  TBAA  assumes 1 will be returned and directly moves the constant value into register eax which carries the return value  TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores  In this case TBAA knows that a float can not alias and int and optimizes away the load of i   Now  to the Rule-Book  What exactly does the standard say we are allowed and not allowed to do  The standard language is not straightforward  so for each item I will try to provide code examples that demonstrates the meaning    What does the C11 standard say   The C11 standard says the following in section 6 5 Expressions paragraph 7      An object shall have its stored value accessed only by an lvalue expression that has one of the following types 88        a type compatible with the effective type of the object    int x   1  int  p    amp x     printf   d n    p       p gives us an lvalue expression of type int which is compatible with int          a qualified version of a type compatible with the effective type of the object    int x   1  const int  p    amp x  printf   d n    p       p gives us an lvalue expression of type const int which is compatible with int          a type that is the signed or unsigned type corresponding to the effective type of the object    int x   1  unsigned int  p    unsigned int   amp x  printf   u n    p        p gives us an lvalue expression of type unsigned int which corresponds to                          the effective type of the object   gcc clang has an extension and also that allows assigning unsigned int  to int  even though they are not compatible types          a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object    int x   1  const unsigned int  p    const unsigned int   amp x  printf   u n    p        p gives us an lvalue expression of type const unsigned int which is a unsigned type                          that corresponds with to a qualified verison of the effective type of the object          an aggregate or union type that includes one of the aforementioned types among its members  including  recursively  a member of a subaggregate or contained union   or   struct foo     int x      void foobar  struct foo  fp  int  ip        struct foo is an aggregate that includes int among its members so it can                                             can alias with  ip  foo f  foobar   amp f   amp f x             a character type    int x   65  char  p    char    amp x  printf   c n    p         p gives us an lvalue expression of type char which is a character type                           The results are not portable due to endianness issues    What the C  17 Draft Standard say  The C  17 draft standard in section  basic lval  paragraph 11 says      If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined 63    11 1      the dynamic type of the object    void  p   malloc  sizeof int        We have allocated storage but not started the lifetime of an object int  ip   new  p  int 0             Placement new changes the dynamic type of the object to int std  cout  lt  lt   ip  lt  lt    n              ip gives us a glvalue expression of type int which matches the dynamic type                                       of the allocated object       11 2      a cv-qualified version of the dynamic type of the object    int x   1  const int  cip    amp x  std  cout  lt  lt   cip  lt  lt    n        cip gives us a glvalue expression of type const int which is a cv-qualified                                 version of the dynamic type of x       11 3      a type similar  as defined in 7 5  to the dynamic type of the object        11 4      a type that is the signed or unsigned type corresponding to the dynamic type of the object       Both si and ui are signed or unsigned types corresponding to each others dynamic types    We can see from this godbolt https   godbolt org g KowGXB  the optimizer assumes aliasing  signed int foo  signed int  amp si  unsigned int  amp ui       si   1    ui   2     return si          11 5      a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object    signed int foo  const signed int  amp si1  int  amp si2      Hard to show this one assumes aliasing       11 6      an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members  including  recursively  an element or non-static data member of a subaggregate or contained union     struct foo    int x         Compiler Explorer example https   godbolt org g z2wJTC  shows aliasing assumption int foobar  foo  amp fp  int  amp ip      fp x   1   ip   2    return fp x     foo f   foobar  f  f x           11 7      a type that is a  possibly cv-qualified  base class type of the dynamic type of the object    struct foo   int x       struct bar   public foo      int foobar  foo  amp f  bar  amp b       f x   1    b x   2     return f x          11 8      a char  unsigned char  or std  byte type    int foo  std  byte  amp b  uint32 t  amp ui       b   static cast lt std  byte gt   a      ui   0xFFFFFFFF                        return std  to integer lt int gt   b        b gives us a glvalue expression of type std  byte which can alias                                         an object of type uint32 t     Worth noting signed char is not included in the list above  this is a notable difference from C which says a character type   What is Type Punning  We have gotten to this point and we may be wondering  why would we want to alias for  The answer typically is to type pun  often the methods used violate strict aliasing rules   Sometimes we want to circumvent the type system and interpret an object as a different type  This is called type punning  to reinterpret a segment of memory as another type  Type punning is useful for tasks that want access to the underlying representation of an object to view  transport or manipulate  Typical areas we find type punning being used are compilers  serialization  networking code  etc      Traditionally this has been accomplished by taking the address of the object  casting it to a pointer of the type we want to reinterpret it as and then accessing the value  or in other words by aliasing  For example   int x    1       In C float  fp    float   amp x       Not a valid aliasing     In C   float  fp   reinterpret cast lt float  gt   amp x        Not a valid aliasing  printf    f n    fp       As we have seen earlier this is not a valid aliasing  so we are invoking undefined behavior  But traditionally compilers did not take advantage of strict aliasing rules and this type of code usually just worked  developers have unfortunately gotten used to doing things this way  A common alternate method for type punning is through unions  which is valid in C but undefined behavior in C    see live example     union u1     int n    float f       union u1 u  u f   1 0f   printf    d n     u n        UB in C   n is not the active member   This is not valid in C   and some consider the purpose of unions to be solely for implementing variant types and feel using unions for type punning is an abuse   How do we Type Pun correctly   The standard method for type punning in both C and C   is memcpy  This may seem a little heavy handed but the optimizer should recognize the use of memcpy for type punning and optimize it away and generate a register to register move  For example if we know int64 t is the same size as double   static assert  sizeof  double      sizeof  int64 t          C  17 does not require a message   we can use memcpy   void func1  double d       std  int64 t n    std  memcpy  amp n   amp d  sizeof d              At a sufficient optimization level any decent modern compiler generates identical code to the previously mentioned reinterpret cast method or union method for type punning  Examining the generated code we see it uses just register mov  live Compiler Explorer Example    C  20 and bit cast  In C  20 we may gain bit cast  implementation available in link from proposal  which gives a simple and safe way to type-pun as well as being usable in a constexpr context   The following is an example of how to use bit cast to type pun a unsigned int to float   see it live    std  cout  lt  lt  bit cast lt float gt  0x447a0000   lt  lt    n      assuming sizeof float     sizeof unsigned int    In the case where To and From types don t have the same size  it requires us to use an intermediate struct15  We will use a struct containing a sizeof  unsigned int   character array  assumes 4 byte unsigned int  to be the From type and unsigned int as the To type    struct uint chars    unsigned char arr sizeof  unsigned int               Assume sizeof  unsigned int      4        Assume len is a multiple of 4  int bar  unsigned char  p  size t len      int result   0    for  size t index   0  index  lt  len  index    sizeof unsigned int         uint chars f     std  memcpy  f arr   amp p index   sizeof unsigned int       unsigned int result   bit cast lt unsigned int gt  f       result    foo  result         return result       It is unfortunate that we need this intermediate type but that is the current constraint of bit cast   Catching Strict Aliasing Violations  We don t have a lot of good tools for catching strict aliasing in C    the tools we have will catch some cases of strict aliasing violations and some cases of misaligned loads and stores    gcc using the flag -fstrict-aliasing and -Wstrict-aliasing can catch some cases although not without false positives negatives  For example the following cases will generate a warning in gcc  see it live    int a   1  short j  float f   1 f     Originally not initialized but tis-kernel caught                    it was being accessed w  an indeterminate value below  printf   i n   j     reinterpret cast lt short  gt   amp a     printf   i n   j     reinterpret cast lt int  gt   amp f       although it will not catch this additional case  see it live    int  p   p  amp a  printf   i n   j     reinterpret cast lt short  gt  p       Although clang allows these flags it apparently does not actually implement the warnings   Another tool we have available to us is ASan which can catch misaligned loads and stores  Although these are not directly strict aliasing violations they are a common result of strict aliasing violations  For example the following cases will generate runtime errors when built with clang using -fsanitize address  int  x   new int 2                    8 bytes   0 7   int  u    int    char  x   6          regardless of alignment of x this will not be an aligned address  u   1                                Access to range  6-9  printf    d n    u                    Access to range  6-9    The last tool I will recommend is C   specific and not strictly a tool but a coding practice  don t allow C-style casts  Both gcc and clang will produce a diagnostic for C-style casts using -Wold-style-cast  This will force any undefined type puns to use reinterpret cast  in general reinterpret cast should be a flag for closer code review  It is also easier to search your code base for reinterpret cast to perform an audit   For C we have all the tools already covered and we also have tis-interpreter  a static analyzer that exhaustively analyzes a program for a large subset of the C language  Given a C verions of the  earlier example where using -fstrict-aliasing misses one case  see it live   int a   1  short j  float f   1 0    printf   i n   j      short   amp a    printf   i n   j      int   amp f     int  p    p  amp a  printf   i n   j      short  p      tis-interpeter is able to catch all three  the following example invokes tis-kernal as tis-interpreter  output is edited for brevity      bin tis-kernel -sa example1 c      example1 c 9  sa  warning  The pointer  short     amp  a  has type short    It violates strict aliasing               rules by accessing a cell with effective type int       example1 c 10  sa  warning  The pointer  int     amp  f  has type int    It violates strict aliasing rules by               accessing a cell with effective type float                Callstack  main      example1 c 15  sa  warning  The pointer  short   p has type short    It violates strict aliasing rules by               accessing a cell with effective type int    Finally there is TySan which is currently in development  This sanitizer adds type checking information in a shadow memory segment and checks accesses to see if they violate aliasing rules  The tool potentially should be able to catch all aliasing violations but may have a large run-time overhead

User · Answer

Strict aliasing doesn t refer only to pointers  it affects references as well  I wrote a paper about it for the boost developer wiki and it was so well received that I turned it into a page on my consulting web site   It explains completely what it is  why it confuses people so much and what to do about it  Strict Aliasing White Paper   In particular it explains why unions are risky behavior for C    and why using memcpy is the only fix portable across both C and C     Hope this is helpful

User · Answer

Technically in C    the strict aliasing rule is probably never applicable   Note the definition of indirection    operator       The unary   operator performs indirection  the expression to which it   is applied shall be a pointer to an object type  or a pointer to a   function type and the result is an lvalue referring to the object or   function to which the expression points    Also from the definition of glvalue     A glvalue is an expression whose evaluation determines the identity of   an object      snip    So in any well defined program trace  a glvalue refers to an object  So the so called strict aliasing rule doesn t apply  ever  This may not be what the designers wanted

[c] What is the strict aliasing rule?

Footnote

Examples related to c

Examples related to undefined-behavior

Examples related to strict-aliasing

Examples related to type-punning