Previous | Next --- Slide 12 of 64
Back to Lecture Thumbnails
jk2d

Another possible problematic scenario: one element in the input takes a long time to compute, while the rest are very fast to compute. In this case, the majority of the work will be the sequential portion of the long task. It might cause workload balance problem.

apk

The cost of acquiring the lock is relatively high so it becomes a significant overhead.

chenh1

If N is large enough, this program will be nearly sequential, the execution of critical section will limits the speedup given unlimited resources.

holard

Would use of lockless primitives such as Fetch-And-Add be a performant substitute for the locks here?

-o4

Similar to @chenh1 said, the synchronization cost of acquiring the lock for a supercomputer (or simply a computer with many more cores than our daily laptop) will be high because the probability of multiple threads waiting on the critical section is a lot higher.

unparalleled

How about this: We just get rid of the locks, and just increment i, this might incur overhead due to duplicate computations, but might potentially be lesser than the synchronization overhead. Also a more complicated version of this might be, instead of making is_prime[] a boolean array, make it of type integer. Each slot can hold three values: 0 - Not computed, 1 - Is Prime and 2 - Not a prime. We initialize the array with all 0s. And in the test_primality function we do the computation only if is_prime[i] == 0. This is some kind of memoization to overcome the duplication overhead.

Could this be a potential solution?

sampathchanda

@unparalleled: But did you think of scheduling elements of x across threads ? Because, when you are doing i=counter++ , it is making sure that each thread is taking up one element of x and computing the test_primality of it. Whereas in you case, multiple threads may end up executing the test_primality function of same element of x at the same time.

Simply put, there seems to be an issue with your algorithm in making threads work on different elements of x.

RomanArena

What is critical section mean? Does it mean the time for lock and unlock, the time for synchronization?

bazinga

Summarizing the problems in this code snippet, that limit speedup:

  1. Critical section is executed sequentially.
  2. If all x[i] take same time, hot spots are created in critical section since all of them will arrive at the critical section at the same time, and that will cause stalls (which are not part of sequential implementation).
  3. If a certain x[i] takes longer to execute than the rest, there could still be workload imbalance.

All these would be a problem if the input size is small. If the input size is large, total synchronization overhead will be small compared to computation time. Also, the "fine granularity partitioning" in this example means that for every element computed, synchronization overhead is encountered. A better solution would amortize the synchronization overhead over a larger working set by using a coarser granularity partitioning. However, fine granularity increases chances of attaining good workload balance. So, the main problem hinted at in this example is that there is always a trade off between choosing good workload balance (with fine granularity partitioning) and amortizing synchronization overhead (with course granularity partitioning).

POTUS

@RomanArena Critical section refers to a section of the program where concurrent access to a shared resource can lead to unexpected or erroneous behavior. In this case, it refers to the line i=counter++

Wikipedia

shhhh

Would the cost of acquiring a lock in this program be greater or less than testing primality for x[i]?

srb

@shhhh It depends on the implementation of the lock, the algorithm used for testing primality, and the values in x - for example, the cost of testing the primality of 2 would likely be less than the cost of acquiring a lock, whereas perhaps testing the primality of a much larger number would be greater. In further slides, we discuss decreasing the granularity and ways of choosing/scheduling tasks that, among other benefits, minimize the amount of times the lock is acquired because there are definitely situations in which the overhead of doing this is a concern.