Previous | Next --- Slide 40 of 48
Back to Lecture Thumbnails
-o4

The lock makes the program run mostly sequentially despite many threads. It increases S -- the inherent fraction of sequential execution and thus harms the maximum speedup.

aperiwal

Locking the addition to variable diff would make the update to it sequential. Thus, every inner loop iteration has only one parallelizable addition and a sequential update component, significantly reducing the performance of the program. A better option would be to accumulate partial sums and then perform sequential addition to a shared variable diff outside the loop(as shown in the next slide).

ykt

An alternative to using an explicit lock would be to just use atomic instructions. This would result in comparatively fewer cache invalidation requests. How much better would depend on the implementation of the lock.

xiaozhuyfk

Might be more efficient to update a block of data for each lock acquire.