In this slide, we are distributing the work such that each task involves computing 10 consecutive elements of the array. Although each task (shown in blue) isn't exactly the same size, with some being much larger than others, we are not wasting as much time acquiring and releasing locks to actually distribute the tasks. Given the different sizes of each task, we can then see how to schedule them to make the workload across threads/cores the most even.
If granularity = 1: Each task is very easy, so synchronization costs (which are unnecessary in sequential codes e.g. using locks) are relatively high.
If granularity = n: One huge job. Terrible work load balance.
It seems to be very practical to pull granularity out of the program to tweak.
One common question we face with granularity is "what happens when we set granularity too high?" Well, we'll when the granularity is set too high, we start to lose our ability to distribute the work evenly. The big problem with granularity is trying to set it at a size where we it is small enough to distribute the work about evenly, but it isn't so small where we lose a lot of time to the critical section.
We could also randomize the array, and statically partition the array and assign a thread to each part. This way, on average we are going to give an equal amount of work to each thread.
^ Array "randomization" produces great results in theory; in practice, there is usually a non-trivial cost with randomly re-ordering an array.
The general rule for choosing a granularity at which to actually exploit concurrency is that fine-grained or small tasks have the potential for better load balance(there are more tasks to divide among processes and hence more concurrency), but they lead to higher task management overhead, more contention and more interprocessor communication than coarse-grained or larger tasks.