Previous | Next --- Slide 33 of 43
Back to Lecture Thumbnails

The implementation of atomic_min is based on reading a value from memory and using CAS to atomically check if any other processor has written to it in between checking and doing CAS by comparing it with the value it has just read. If the CAS succeeds and we read the old value, then we (think we) know that no other processor has written to the address, and so we can swap with the min value. atomic_increment can be implemented in exactly the same way, replacing min(old,x) with the increment instead.

However, this implementation suffers from the ABA problem. Imagine the scenario where processor 1 reads 1 from addr, processor 2 atomically adds 1 to addr, and processor 3 atomically subtracts 1 from addr. Then in processor 1, the old value and the value read by the CAS are both still 1, yet there indeed have been other processors that wrote to addr. In the case of addition, this is the correct behavior still, but in general this is still something to look out for.


What does atomicCAS return? CUDA docs online say that it always returns old (, but then doesn't old always equal old?


@totofufu That's the point. atomicCAS will return what you read in addr, and if addr has been written to then the return of atomicCAS will be a different old value than the one you recorded.


@TomoA yup, I misunderstood. atomicCAS returns what is stored in addr, but that's not the same value as old necessarily.



What are some instances when the ABA problem can cause incorrect behavior? Since the new value remains the same as old value, it doesn't seem to matter?


Are these additional primitives "more expensive" than the original operations (add, sub, CAS)? In this implementation, it looks like atomic_min could be making several calls to atomicCAS, and so I would think atomic_min would incur possibly more synchronization problems.

However, there are probably better ways of implementing atomic_min?


@patrickbot, I remember someone in class has suggested a better solution: if the value stored at addr is smaller than val, the function can exit directly.


You can also implement atomic_min using an atomic exchange operation, which is the solution I came up with for Assignment 3. A thread swaps its value with the value in memory, and then keeps swapping as long as the value the thread is holding is less than the value it swapped into memory. This maintains the invariant that a thread only exits the atomic_min function while holding onto a value that is greater than the value it swapped into memory, so when all threads have exited the value remaining in memory must be the min of all values. In code this would look like:

void atomic_min(int *addr, int x) {
    do {
        int old_x = x;
        x = atomic_exchange(addr, old_x);
    } while (x < old_x);

@Updog, thank you for providing us with a new version of atomic_min. Could you talk about the implementation of atomic_exchange? It seems to be an expensive operation.


void atomic_increment(int *addr, int x) {
    int old = *addr;
    int new = old + x;
    while (atomicCAS(addr, old, new) != old) {
        old = *addr;
        new = old + x;

void lock(*addr) {
    while(atomicCAS(addr, 0, 1) == 1) {

void unlock(*addr) {
    *addr = 0;

The spec of the atomicMin is a bit misleading since it claims to return an int, but instead it is simply storing the min of *addr and x to addr without returning anything.


int atomic_min(int *addr, int val)


int stored_val = *addr;

while (val < stored_val) {

 stored_val = atomicCAS(*addr, stored_val, val);


return stored_val;



The CUDA programing guide says "Atomic operations are guaranteed to be performed without interference from other threads. In other words, no other thread can access this address until the operation is complete." But here other threads can also read or write the address when atomicCAS or atomicMin is performed, does it violate the concept of atomic operation?


@eknight7: imagine if you were writing some sort of lock-free linked list and were performing CAS on the address of a node rather than its contents or whatever. You could easily end up with a node being reinserted with a different next pointer, so same address but putting the list in a different state from what the CAS user expected.