Slide View : 15-418 Spring 2013

Previous | Next --- Slide 29 of 38

max

In this example, we speculated that the cost of the critical region (the locked portion of the previous code) was taking up a large portion of the program's run time. To resolve this issue, we modified the counter to increment by a larger amount, so a single thread is now responsible for checking multiple numbers, but there are fewer sequential pieces of code.

This comment was marked helpful 0 times.

kayvonf

I'd like to clarify the discussion we had in lecture:

I think it's better to say that by increasing granularity, it is possible to reduce time spent in the critical section. Increasing granularity has two benefits:

It reduces overhead. The lock and unlock operations above constitute overhead that does not exist in the sequential version of the program. By increasing the granularity of work assignment, the program takes the lock fewer times, and so this overhead is reduced.
It reduces the likelihood of contention for the lock, or equivalently... it reduces the likelihood thread execution is serialized within the critical section. I was a bit loose in lecture when I called the critical section "sequential". Really, the critical section only triggers sequential execution of threads that are trying to enter it at the same time. Threads that are performing other work, such as executing test_primality, can certainly be running in parallel with a thread in the critical section.

This comment was marked helpful 0 times.

jedavis

I think it would also be safe to say that for most random distributions for the cost of jobs, increasing granularity will also tend to decrease the variance of the length of time each thread spends computing between critical sections, at the cost of increasing the expected value of that interval. Whether this is good or bad may vary with load, but is secondary to the reduction in synchronization overhead.

This comment was marked helpful 0 times.

xiaowend

By increasing granularity, we allocate a larger amount of work to a thread per time. If the total number N is large enough, it always works well. For small N, it may cause unbalanced workload distribution.

This comment was marked helpful 0 times.

kfc9001

@xiaowend How does it always work well? You'll still have an unbalanced workload if one of a million processors takes 1000 times longer than the rest (this could actually happen). Then you'd still have to wait on the processor that unfortunately ended up with the most work.

This comment was marked helpful 1 times.

xiaowend

@kfc9001 You are right. I should have assumed the case is 1:N is large enough, 2:each workload are almost equal.

In fact, I intended to say larger granularity may lead to unbalance workload.

This comment was marked helpful 0 times.