This seems to have been the most important guideline for program 3, part 2 on assignment 1. Even though the machines only had 4 processors, dividing the work into 4 tasks didn't even come close to the maximum speedup possible. Personally, I started getting the maximum speedup I could once I increased to 80 tasks. This happened because as the tasks got smaller, there was more leeway with how they could be distributed to balance the workload.
However, one thing that confused me as I did that problem on assignment 1 is that the speedup didn't decrease even as I increased to 800 tasks (the maximum since the image had 800 rows). It seemed to me like the benefit of finer granularity would fall off as more tasks were added. I guess this was just an example of the benefits of workload balancing canceling out the problems with overhead?
This comment was marked helpful 0 times.
@jmnash I agree; in this case, the overhead of launching another ispc task was more than covered up by the fact that each task had a significant amount of work (calculating values for width elements, which in this case was 1200).
The reason why we didn't see an increase in performance was simply because there weren't enough cores to run those tasks simultaneously. However, it did significantly even out the work distribution, as no core got 'overworked' by having the misfortune of being assigned most of the work (which is what happens when you only have 4 tasks; the middle two rows far outweigh the other rows in terms of work required).
However, something very different happens when you use an image with a much smaller width. I tried running the program with width = 4 (same height = 800). Here are my results:
width = 4
height = 800
There is actually a dip in performance when you increase the number of tasks; this is because the tasks are now doing far less work, so the overhead cost of launching a task is now significant.
This comment was marked helpful 3 times.
Excellent post @arjunh.