Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 30 of 56

KnightsLanding

How does atomic compare and swap instruction work? Does it also involve load, then compare, and store? How does it guarantee atomicity?

bpr

This is a fun one, @KnightsLanding. :) First, yes, an atomic compare and swap (CMPXCHG) involves a load, compare, and store. In fact, an x86 processor may internally decode the CMPXCHG into these operations (c.f., micro-ops).

How are multiple memory operations atomic? In a bus-based x86 system, there was an additional pin (LOCK#) as part of the bus that locked the bus. When the bus is locked, no other processor can use the bus. Therefore any atomic operation would set LOCK# during the entire operation (such as load, compare, and store).

More recently, the x86 processors rely on cache coherence to guarantee atomicity (Intel calls this cache locking in the processor manual). What the cache (likely) does is either delay the reply until the atomic operation is complete or reply with a NACK while in the atomic region which forces the requestor to retry its operation.

IntergalacticPeanutMaker

When P0 loads Y I thought P0 would go to the E state from the S state, but there's no edge on the state diagram for such a transition.

Does every cache line essentially have its own state diagram? So when P0 loads Y it goes from state I to state E; however, with respect to the cache line X is mapped to P0 is still in state S (because P1 also has X in its cache)?

jerryzh

@IntergalacticPeanutMaker, yes, as mentioned in the lecture, each cache line has a state, that's why operations on different cache lines don't influence each other.

KnightsLanding

@bpr Thank you very much! =P

Before I thought CAS is almost free, now I know more about the actual cost. One interesting cost analysis: http://stackoverflow.com/questions/2538070/atomic-operation-cost