Previous | Next --- Slide 9 of 40
Back to Lecture Thumbnails
unihorn

Question: To combine the partial sums, we add them one by one to get O(P) running time. Is it possible to add them as a binary tree, using O(log(P)) time? What is about the implementation of reduceAdd function?

Thedrick

@unihorn Although it may be possible to split some of the addition across multiple processors, we probably wouldn't see any real speedup. In most cases P will not be that large and the communication across processors may even make it take longer than just doing it serially. Also, ${N^2 \over P} $ will most likely dominate the O(P) time.

miko

The parts of the program that are inherently sequential (for this case, it would be the step wherein we combine partial sums) usually are the major factors that limit its speedup. The graph on the next slide (slide 10) is good to look at to observe the effect of an increase in processors on maximum speedup, given a set fraction of the program that is inherently sequential.

bourne

Looking just at the partial sums, would it be possible to perform each level of the binary tree in parallel on one processor using SIMD with a noticeable speedup (no tasks)?

pebbled

@bourne The sums across one level of the tree are independent, so there's no reason you couldn't process as many in parallel as you have SIMD lanes on your processor. It could be some tricky code, however, to keep each program instance fed with input, when the input to each level is the output from the previous level.