What exactly is std atomic

Question

I understand that std  atomic lt  gt  is an atomic object  But atomic to what extent  To my understanding an operation can be atomic  What exactly is meant by making an object atomic  For example if there are two threads concurrently executing the following code   a   a   12    Then is the entire operation  say add twelve to int   atomic  Or are changes made to the variable atomic  so operator

User · Answer

std  atomic exists because many ISAs have direct hardware support for it  What the C   standard says about std  atomic has been analyzed in other answers   So now let s see what std  atomic compiles to to get a different kind of insight   The main takeaway from this experiment is that modern CPUs have direct support for atomic integer operations  for example the LOCK prefix in x86  and std  atomic basically exists as a portable interface to those intructions  What does the  quot lock quot  instruction mean in x86 assembly  In aarch64  LDADD would be used   This support allows for faster alternatives to more general methods such as std  mutex  which can make more complex multi-instruction sections atomic  at the cost of being slower than std  atomic because std  mutex it makes futex system calls in Linux  which is way slower than the userland instructions emitted by std  atomic  see also  Does std  mutex create a fence   Let s consider the following multi-threaded program which increments a global variable across multiple threads  with different synchronization mechanisms depending on which preprocessor define is used   main cpp   include  lt atomic gt   include  lt iostream gt   include  lt thread gt   include  lt vector gt   size t niters    if STD ATOMIC std  atomic ulong global 0    else uint64 t global   0   endif  void threadMain         for  size t i   0  i  lt  niters    i     if LOCK           asm     volatile                  lock incq  0                   m   global                   g   i     to prevent loop unrolling                                         else           asm     volatile                                    g   i     to prevent he loop from being optimized to a single add                g   global                                   global     endif          int main int argc  char   argv        size t nthreads      if  argc  gt  1            nthreads   std  stoull argv 1   NULL  0         else           nthreads   2            if  argc  gt  2            niters   std  stoull argv 2   NULL  0         else           niters   10            std  vector lt std  thread gt  threads nthreads       for  size t i   0  i  lt  nthreads    i          threads i    std  thread threadMain       for  size t i   0  i  lt  nthreads    i          threads i  join        uint64 t expect   nthreads   niters      std  cout  lt  lt   expect    lt  lt  expect  lt  lt  std  endl      std  cout  lt  lt   global    lt  lt  global  lt  lt  std  endl      GitHub upstream   Compile  run and disassemble   comon  -ggdb3 -O3 -std c  11 -Wall -Wextra -pedantic main cpp -pthread  g   -o main fail out                     common g   -o main std atomic out -DSTD ATOMIC  common g   -o main lock out       -DLOCK        common    main fail out       4 100000   main std atomic out 4 100000   main lock out       4 100000  gdb -batch -ex  disassemble threadMain  main fail out gdb -batch -ex  disassemble threadMain  main std atomic out gdb -batch -ex  disassemble threadMain  main lock out   Extremely likely  wrong  race condition output for main fail out   expect 400000 global 100000   and deterministic  right  output of the others   expect 400000 global 400000   Disassembly of main fail out      0x0000000000002780  lt  0 gt       endbr64     0x0000000000002784  lt  4 gt       mov    0x29b5  rip   rcx          0x5140  lt niters gt     0x000000000000278b  lt  11 gt      test    rcx  rcx    0x000000000000278e  lt  14 gt      je     0x27b4  lt threadMain   52 gt     0x0000000000002790  lt  16 gt      mov    0x29a1  rip   rdx          0x5138  lt global gt     0x0000000000002797  lt  23 gt      xor     eax  eax    0x0000000000002799  lt  25 gt      nopl   0x0  rax     0x00000000000027a0  lt  32 gt      add     0x1  rax    0x00000000000027a4  lt  36 gt      add     0x1  rdx    0x00000000000027a8  lt  40 gt      cmp     rcx  rax    0x00000000000027ab  lt  43 gt      jb     0x27a0  lt threadMain   32 gt     0x00000000000027ad  lt  45 gt      mov     rdx 0x2984  rip           0x5138  lt global gt     0x00000000000027b4  lt  52 gt      retq   Disassembly of main std atomic out      0x0000000000002780  lt  0 gt       endbr64     0x0000000000002784  lt  4 gt       cmpq    0x0 0x29b4  rip           0x5140  lt niters gt     0x000000000000278c  lt  12 gt      je     0x27a6  lt threadMain   38 gt     0x000000000000278e  lt  14 gt      xor     eax  eax    0x0000000000002790  lt  16 gt      lock addq  0x1 0x299f  rip           0x5138  lt global gt     0x0000000000002799  lt  25 gt      add     0x1  rax    0x000000000000279d  lt  29 gt      cmp     rax 0x299c  rip           0x5140  lt niters gt     0x00000000000027a4  lt  36 gt      ja     0x2790  lt threadMain   16 gt     0x00000000000027a6  lt  38 gt      retq      Disassembly of main lock out   Dump of assembler code for function threadMain       0x0000000000002780  lt  0 gt       endbr64     0x0000000000002784  lt  4 gt       cmpq    0x0 0x29b4  rip           0x5140  lt niters gt     0x000000000000278c  lt  12 gt      je     0x27a5  lt threadMain   37 gt     0x000000000000278e  lt  14 gt      xor     eax  eax    0x0000000000002790  lt  16 gt      lock incq 0x29a0  rip           0x5138  lt global gt     0x0000000000002798  lt  24 gt      add     0x1  rax    0x000000000000279c  lt  28 gt      cmp     rax 0x299d  rip           0x5140  lt niters gt     0x00000000000027a3  lt  35 gt      ja     0x2790  lt threadMain   16 gt     0x00000000000027a5  lt  37 gt      retq   Conclusions    the non-atomic version saves the global to a register  and increments the register   Therefore  at the end  very likely four writes happen back to global with the same  wrong  value of 100000  std  atomic compiles to lock addq  The LOCK prefix makes the following inc fetch  modify and update memory atomically  our explicit inline assembly LOCK prefix compiles to almost the same thing as std  atomic  except that our inc is used instead of add  Not sure why GCC chose add  considering that our INC generated a decoding 1 byte smaller    ARMv8 could use either LDAXR   STLXR or LDADD in newer CPUs  How do I start threads in plain C   Tested in Ubuntu 19 10 AMD64  GCC 9 2 1  Lenovo ThinkPad P51

User · Answer

I understand that std  atomic lt  gt  makes an object atomic    That s a matter of perspective    you can t apply it to arbitrary objects and have their operations become atomic  but the provided specialisations for  most  integral types and pointers can be used      a   a   12    std  atomic lt  gt  does not  use template expressions to  simplify this to a single atomic operation  instead the operator T   const volatile noexcept member does an atomic load   of a  then twelve is added  and operator  T t  noexcept does a store t

User · Answer

Each instantiation and full specialization of std  atomic lt   represents a type that different threads can simultaneously operate on  their instances   without raising undefined behavior      Objects of atomic types are the only C   objects that are free from data races  that is  if one thread writes to an atomic object while another thread reads from it  the behavior is well-defined       In addition  accesses to atomic objects may establish inter-thread synchronization and order non-atomic memory accesses as specified by std  memory order    std  atomic lt  gt  wraps operations that  in pre-C   11 times  had to be performed using  for example  interlocked functions with MSVC or atomic bultins in case of GCC   Also  std  atomic lt  gt  gives you more control by allowing various memory orders that specify synchronization and ordering constraints  If you want to read more about C   11 atomics and memory model  these links may be useful    C   atomics and memory ordering  Comparison  Lockless programming with atomics in C   11 vs  mutex and RW-locks C  11 introduced a standardized memory model  What does it mean  And how is it going to affect C   programming  Concurrency in C  11   Note that  for typical use cases  you would probably use overloaded arithmetic operators or another set of them   std  atomic lt long gt  value 0   value      This is an atomic op value    5    And so is this   Because operator syntax does not allow you to specify the memory order  these operations will be performed with std  memory order seq cst  as this is the default order for all atomic operations in C   11  It guarantees sequential consistency  total global ordering  between all atomic operations   In some cases  however  this may not be required  and nothing comes for free   so you may want to use more explicit form   std  atomic lt long gt  value  0   value fetch add 1  std  memory order relaxed      Atomic  but there are no synchronization or ordering constraints value fetch add 5  std  memory order release      Atomic  performs  release  operation   Now  your example   a   a   12    will not evaluate to a single atomic op  it will result in a load    which is atomic itself   then addition between this value and 12 and a store    also atomic  of final result  As I noted earlier  std  memory order seq cst will be used here   However  if you write a    12  it will be an atomic operation  as I noted before  and is roughly equivalent to a fetch add 12  std  memory order seq cst    As for your comment      A regular int has atomic loads and stores  Whats the point of wrapping it with atomic lt  gt     Your statement is only true for architectures that provide such guarantee of atomicity for stores and or loads  There are architectures that do not do this  Also  it is usually required that operations must be performed on word- dword-aligned address to be atomic std  atomic lt  gt  is something that is guaranteed to be atomic on every platform  without additional requirements  Moreover  it allows you to write code like this   void  sharedData   nullptr  std  atomic lt int gt  ready flag   0      Thread 1 void produce         sharedData   generateData        ready flag store 1  std  memory order release         Thread 2 void consume         while  ready flag load std  memory order acquire     0                std  this thread  yield               assert sharedData    nullptr      will never trigger     processData sharedData       Note that assertion condition will always be true  and thus  will never trigger   so you can always be sure that data is ready after while loop exits  That is because    store   to the flag is performed after sharedData is set  we assume that generateData   always returns something useful  in particular  never returns NULL  and uses std  memory order release order       memory order release      A store operation with this memory order performs the release   operation  no reads or writes in the current thread can be reordered   after this store  All writes in the current thread are visible in   other threads that acquire the same atomic variable    sharedData is used after while loop exits  and thus after load   from flag will return a non-zero value  load   uses std  memory order acquire order       std  memory order acquire      A load operation with this memory order performs the acquire operation   on the affected memory location  no reads or writes in the current   thread can be reordered before this load  All writes in other threads   that release the same atomic variable are visible in the current   thread    This gives you precise control over the synchronization and allows you to explicitly specify how your code may may not will will not behave  This would not be possible if only guarantee was the atomicity itself  Especially when it comes to very interesting sync models like the release-consume ordering

[c++] What exactly is std::atomic?

Examples related to c++

Examples related to multithreading

Examples related to c++11

Examples related to atomic