Previous | Next --- Slide 13 of 29
Back to Lecture Thumbnails

I remember a situation where an unfair lock made my program run faster:

I was trying to measure the performance of a FIFO-locked resource allocator by making lots of small allocations. The results were awful because each thread would get the lock, finish a tiny fraction of its work, and then attempt to take the lock again. But since the lock was FIFO, that thread had go to the back of the queue of threads waiting on the lock. It would also have to context switch to the thread at the head of the queue. Since context switch time was greater than the length of the tight critical section, the CPU wound up spending most of its time context switching (note that a nonblocking fair lock on a multicore system would have been even worse in this particular case).

I thought about changing that function so that it could batch requests while holding a lock, but I think it wasn't practical because there was a lot of non-critical work to do between requests (writing zeros). Instead, I just switched to an unfair test-and-set lock. That let me approximate batching because threads that won the lock could continue to retake the lock until they got context switched out. They'd also get an unfair extra scheduler slot of they got context switched out while inside the critical section, but I found that no thread ever actually failed to win the lock more than a handful of times. I wound up with much better latency and throughput, and a lot less congestion on the lock.


The reason test and set is high traffic and poor scaling is because processor has to constantly issue a memory read and a memory write operation atomically.