Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Performance Optimization II: Locality, Communication, & Contention

Previous | Next --- Slide 68 of 72

kayvonf

Question: What is the optimization performed on this slide? And what is the intuition behind why it is helpful. (I'd like to see the word contention in your answer.)

eourcs

Since we are appending to an inner list in a 2D array, there is no need to lock the entire array when modifying an inner list. We can employ finer grain control by locking only the inner list being modified. Consider an implementation with k threads. The first solution would result in k threads contending for access to a resource, while the second solution will, assuming a uniform distribution, will result in k/16 threads contending for access to a resource. The important thing to note is that in the first example, we are locking for access to the entire array, while in the second case, we are locking for access to an inner list in the array.

shreeduh

Previously for every particle, the 16 cell_list elements contended to acquire a single lock, while in this case, each of them gets a unique lock, leading to less contention

sstritte

In the previous slide, we had one lock for the 2D array of lists. This led to high contention because whenever a thread needed to add a particle to a list, it needed to obtain the lock for the entire 2D array (so no other threads could update the 2D array).

On this slide, we have 16 locks, one corresponding to each of the 16 cell lists. Now, when a thread needs to update a list, it can obtain exclusive access to that list without locking all the other lists. This means other threads can be updating different lists in the 2D array.