In the demo with 4 (or n) processors, would it have helped to have a scheduler/dispatcher to 'manage' the distribution of load? As it happened, the main reason for the strategy to not have sped up the process was the overhead caused by the re-distribution.
I think it may have helped to have a scheduler because then the distribution of work could have been passed to the scheduler as opposed to the processors themselves. I do have a follow-up question though: doesn't the work done by the scheduler add to the total work though? In that case, would it have helped the speed up? I would imagine that the distribution has to occur before the processors can do their respective calculations.
@makingthingsfast, I agree with your opinions about scheduler. Scheduler needs to use some strategies to distribute the work among nodes. The more sophisticated the strategy is, the more overhead it will have.
During the demo, I had originally thought that a work-stealing strategy would have helped to speed up the computation. However, I imagine this still incurs some overhead (when a processor takes work from other processors), which probably cancels out any speed up gained from reducing the number of idle processors.
@makingthingsfast I think the scheduler's work should count towards total work. Our hope is that using the scheduler (and incurring this extra work) will reduce the span of the remaining (hopefully now better parallelized) work.
Thinking a step further, I wonder that in more complicated situations, should we have yet another pre-computation step that determines whether it is efficient to run the scheduler at all?
@kipper, I think work stealing strategy will also not work as desired for situations when work is being offloaded to other processor but previous processor had already data in cache. New processor will have to pull the data again from the memory which will be an expensive operation in this case.