Here is the notes (from Brian Goetz book) I made, that might be of help to you
AtomicXXX classes
provide Non-blocking Compare-And-Swap implementation
Takes advantage of the support provide by hardware (the CMPXCHG instruction on Intel) When lots of threads are running through your code that uses these atomic concurrency API, they will scale much better than code which uses Object level monitors/synchronization. Since, Java's synchronization mechanisms makes code wait, when there are lots of threads running through your critical sections, a substantial amount of CPU time is spent in managing the synchronization mechanism itself (waiting, notifying, etc). Since the new API uses hardware level constructs (atomic variables) and wait and lock free algorithms to implement thread-safety, a lot more of CPU time is spent "doing stuff" rather than in managing synchronization.
not only offer better throughput, but they also provide greater resistance to liveness problems such as deadlock and priority inversion.