Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

mattlkf

So the assumption here is that the test-and-set instruction triggers an invalidation of the cache line even when the test fails and nothing needs to be written?

yeq

A processor is going to invalidate its line when some other processors try to acquire the lock because ts is considered a write operation.

crabcake

2 overhead here: 1. cache miss when thread 1 try to release the lock. 2. furious communication caused by thread 2 & 3 when thread 1 is executing the critical area.

Cake

Note that performance is not hampered by the fact that the other processes are invalidating their own cache lines multiple times; they would have been idle anyway since the first process has the lock.

locked

@mattlkf Every call to test-and set will acquire a BusRdX before the instructions test whether it can hold the lock or not. So even if test fails, it triggers an invalidation of the cache line.

chenh1

The performance is hampered when the I think because the invalidation message will hold the bus and it will take time to complete the invalidation. If the processor can use this time to do other work, it will do more work.

unparalleled

@mattlkf: what @locked says is right. If this assumption does not hold, it would mean that a BusRdX would be issued conditionally. Though this might minimize the number of BusRdX, but the duration of the atomic instruction will increase, which we do not want. This is how I have understood, it would be great if someone could confirm

khans

Performance is hampered by the other processes invalidating their cache lines because of the communication overhead - if there's a single bus handling invalidation, that's a resource that Processor 1 can't use because it's being used by Processor 2 and 3, which aren't even doing useful work. The other processors grabbing the cache line also forces Processor 1 to take a cache miss when it wants to write results back at the end.

life

if there are resources required by processor1 in the same cache line as the lock, the performance of the usage of those resources will get hampered because of constant invalidations from processor 2&3

anonymous

From another view, since "test-and-set" lock is usually used on the condition that the overhead of scheduling is not such big. However, the cache invalidation makes the overhead of busy-waiting more significant. Suppose the "critical section" is just a variable that frequently accessed by different threads, and one thread wants to obtain the lock, it will keep checking the lock use this "test-and-set" mechanism. This will incur a significant overhead since every operation introduces a cache invalidation and makes other threads update their cache value via a bus read.

fxffx

In this case, the processor 2 and processor 3 are generating unnecessary memory traffic because there is a write operation in the test-and-set function, so they all request other processors invalidate the cache even if the test-and-set fails.