Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

sidwad

So now we're busy waiting for access for lock, however with read requests (not write/read-exclusive requests). This way, the lock variable does not need to be cache invalidated in all other processor caches at every read at each processor.

However, whenever the lock is released and claimed by one of the processors, the test_and_set instruction is executed on that processor requiring cache invalidation of lock on all other processors. (but this happens much less often than the read requests, thereby bringing the total number of cache invalidations down)

stl

The test and test and set lock is useful because test_and_set can be thought of as a write operation. That means if we just execute test_and_set, we always have to invalidate the cache line even if we don't actually end up setting the value. With the initial test that is just a read, we don't incur the cost of a write until we know we actually have to update the value.

BensonQiu

An additional benefit of test-and-test-and-set is that if the initial read fails (*lock != 0), subsequent failed reads will be cache hits, so there is no interconnect traffic.

teamG

I may be missing something here, so if test and set successfully acquired the lock, in other words, wrote 1 to the address, it will return 0 to indicate success? that seems like what the bnz instruction in the previous slides is trying to say. Can anyone confirm if my understanding is correct?

CaptainBlueBear

@teamG The test-and-set instruction writes to a memory location and return the location's old value.

So if we call test_and_srt(*lock) and we get a return value of 0 then we know that the value at the lock address was 0 (lock was free) and we've now acquired it (by writing 1 to the lock address).

cmusam

Using test-test-set, a waiting thread is still continuously checking whether the lock is available. However, instead of always attempting to write, the first test is always reading. Only when the lock is observed to be free, the thread will call test_and_set.

yimmyz

The key difference between the vanilla test-and-set lock and the test-and-test-and-set lock is that we are fronting the test-and-set instruction with a spin (which only reads from the address). Therefore, when some thread is accessing the critical section, the other threads would not trigger RdX requests which bombard the whole bus.

However, when one thread releases the lock, all others are free to grab the lock by calling test-and-set, and we can expect there to be a surge of RdX requests -- fortunately, that's the only time when RdX requests are made.

cyl

The important thing here is the read instruction won't cause invalidation. So the read will hit the cache as long as no other processor invalidate the shared lock, and it is the benefit of having the cache coherence protocol.

leis1

In short, test-and-test-and-set has low traffic compared to test-and-test due to the reduced write operations.

RX

This is what we have seen in Assignment 3, e.g. in the bfs.cpp file, we first do a test and then do a compare and swap

enuitt

Test-and-test-and-set allows the program issue a write operation (test-and-set) only when the lock is free, and hence prevents unnecessary cache invalidation