The above answers get at the most fundamental aspects of the C++ memory model. In practice, most uses of std::atomic<>
"just work", at least until the programmer over-optimizes (e.g., by trying to relax too many things).
There is one place where mistakes are still common: sequence locks. There is an excellent and easy-to-read discussion of the challenges at https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf. Sequence locks are appealing because the reader avoids writing to the lock word. The following code is based on Figure 1 of the above technical report, and it highlights the challenges when implementing sequence locks in C++:
atomic<uint64_t> seq; // seqlock representation
int data1, data2; // this data will be protected by seq
T reader() {
int r1, r2;
unsigned seq0, seq1;
while (true) {
seq0 = seq;
r1 = data1; // INCORRECT! Data Race!
r2 = data2; // INCORRECT!
seq1 = seq;
// if the lock didn't change while I was reading, and
// the lock wasn't held while I was reading, then my
// reads should be valid
if (seq0 == seq1 && !(seq0 & 1))
break;
}
use(r1, r2);
}
void writer(int new_data1, int new_data2) {
unsigned seq0 = seq;
while (true) {
if ((!(seq0 & 1)) && seq.compare_exchange_weak(seq0, seq0 + 1))
break; // atomically moving the lock from even to odd is an acquire
}
data1 = new_data1;
data2 = new_data2;
seq = seq0 + 2; // release the lock by increasing its value to even
}
As unintuitive as it seams at first, data1
and data2
need to be atomic<>
. If they are not atomic, then they could be read (in reader()
) at the exact same time as they are written (in writer()
). According to the C++ memory model, this is a race even if reader()
never actually uses the data. In addition, if they are not atomic, then the compiler can cache the first read of each value in a register. Obviously you wouldn't want that... you want to re-read in each iteration of the while
loop in reader()
.
It is also not sufficient to make them atomic<>
and access them with memory_order_relaxed
. The reason for this is that the reads of seq (in reader()
) only have acquire semantics. In simple terms, if X and Y are memory accesses, X precedes Y, X is not an acquire or release, and Y is an acquire, then the compiler can reorder Y before X. If Y was the second read of seq, and X was a read of data, such a reordering would break the lock implementation.
The paper gives a few solutions. The one with the best performance today is probably the one that uses an atomic_thread_fence
with memory_order_relaxed
before the second read of the seqlock. In the paper, it's Figure 6. I'm not reproducing the code here, because anyone who has read this far really ought to read the paper. It is more precise and complete than this post.
The last issue is that it might be unnatural to make the data
variables atomic. If you can't in your code, then you need to be very careful, because casting from non-atomic to atomic is only legal for primitive types. C++20 is supposed to add atomic_ref<>
, which will make this problem easier to resolve.
To summarize: even if you think you understand the C++ memory model, you should be very careful before rolling your own sequence locks.