It's helpful to think of the two extremes. If we have a granularity of N, then our problem is that one worker thread is doing all the work and we've essentially reverted back to the sequential program. On the other hand, we've already seen the problem with a granularity of 1, which is that we might get overwhelmed with the overhead of synchronization. In general, the trade-off is between synchronization cost and thread workload, where increasing granularity will decrease synchronization cost and increase thread workload while decreasing granularity will do the opposite.
The trade-off is between synchronization cost and workload balance. We want to have enough amount of tasks to achieve a good workload balance. Also, we want to have minimum tasks to reduce synchronization cost.
Another way we can think of granularity control as two extremes is static vs. dynamic assignment. Say we have N workers and we divide into exactly N large tasks. This is equivalent to static assignment and the way we divide the tasks is the problem of static assignment.
The more tasks we create, the less of an issue uneven division of work is because dynamic assignment will handle allocating the different size tasks. However, the overhead costs to doing dynamic assignment also increases.
Interestingly, it may be possible to sometime dynamically determine the granularity as well. In 15210's pasllab, there was a notion of cstmnt's which allow you to input a complexity function and some "magic" happens to determine when to run sequentially and when to run in parallel.
Though this example is not exactly the same as determining the granularity of tasks, I can see there being some parallel (no pun intended). Are there libraries that do dynamic granularity control in the context of tasks?
As mentioned above, a lot of optimization relies on the machine that its run on. In application, its not always known what machine will be running the code. How do programmers deal with this issue?
In summary, we need to strike a balance between amount of work and amount of overhead while also considering the machine we're running on when deciding task size. More tasks is likely to lead to more evenly distributed work. We usually want the number of tasks to be some multiple of the number of processors because we want each processor to be doing roughly the same amount of work. However, if we have too many tasks then we start incurring the cost of scheduling the tasks and we don't get enough work done compared to the effort it took to assign the work itself.
I think when programmers to remedy the problem of not knowing the machine the code will run on, the programmer can write code with multiple machines in mind and try to parametrize the things that will change with hardware like number of tasks, blocks, or threads.
In choosing the task size we can think about mean and variance.
We want the mean workload to be large, then we'll spend less time in context switching and synchronization overhead.
On the other hand we want the variance of workload to be small, so that threads won't waste time waiting for heavy tasks to complete.
In the case where we are grouping tasks together, choosing a task size is a trade-off: a larger task size results in a larger mean workload, but also larger variance.
Is it possible that the system overrides the programmers work distribution and scheduling decisions if it feels that it is sub optimal and can come up with a better distribution and scheduling policy?
@chuangxuean: I think the difficulty of that is that the system will need to predict the length of the work which is practically impossible.