Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 25 of 30

Berry

Am I correct in saying that there is no issue here even if there were more than two consecutive barriers since no thread would be able to enter the 3rd barrier until all threads have left the 1st barrier meaning that alternating between 2 flag values is enough?

kayvonf

@berry. Yes, that is correct.

rbandlam

I felt amount of code decreased with sense reversal when compared to earlier slide for implementing centralized barrier. But sense reversal implementation requires more storage space because each processor has a separate local variable where as in earlier slide each barrier has just one extra int to keep track of leaving counter.

Kapteyn

Another advantage of this implementation over the previous slide's implementation is that we don't have threads spinning on the leave_counter which gets updated P times if we have P processors causing potentially P^2 cache misses as waiting threads spin on the leave_counter.

Here, each processor experiences one cache miss caused by reading the counter variable only once in the barrier and then another cache miss when it reads the flag after it's been updated, which is 2P cache misses instead of P^2.

dumbo_

The local_sense variable here acts like a local flag indicating this processor is waiting for other processors to finish, which elegantly interact with the global flag variable and play the role of tracking # of processors currently staying in the barrier. It also avoids spinning on the leave_counter.