Here we see that adding a read-only while loop allows the different processors to only read from their local caches, reducing drastically the coherence-related traffic that existed before.
Yet invalidation still occurs when P1 releases the lock. This is because both P2 and P3 see the lock released, and attempt to acquire the lock together.
@bochet but that doesn't harm since P1 will not access that lock and later if it wants, it has to read it again from the cache.
P3 issues a BusRdX while P2 already acquired the lock because they both pass the first test after P1 releases the lock but P2 pass the test-and-set before P3.
When you compare this slide to the one on test-and-set lock, you see that there is way less time taken to switch which process has the lock, because there is less contention for the bus to read the value of the lock in memory, because they are able to just read their own local caches.
Key takeaway: Reduces coherence traffic, since reading from local caches.
Test-and-test-and-set lock reduces traffic because each process will only need to check its local cache to see whether the value has changed, and only send out traffic when it observes that the value becomes valid.