Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 10 of 40

andymochi

In our third demo, I'm very curious about what the limitations were with the shout-solution proposed to find the max. This was the solution:

1) master broadcasts a number 2) all processors who have a higher number respond 3) master picks randomly from responses; ignores rest 4) Repeat (1-3) until no responses

At first, I thought this was a pretty bad solution, but after the demo realized it's actually pretty good. It also makes up for some of the limitations when it comes to 'shouting'.

In reality, I can see several limits to 'shouting' in general. -synchronization: if there are too many responses (e.g. 100 processors talking at once), the master could miss a response, or have to wait for responses. Picking at random (say, the first response that comes back) allows the master to ignore both of these problems. -efficiency: The overhead of shouting synchronization is usually high, but choosing a random result ignores that. It could obviously hit worst case runtime, but I think good on average. Assuming a good randomization strategy (wait for min(x responses, t time), then pick randomly). It also potentially halves the time for the expected communication overhead of shouting, under the right conditions. If network roundtrip time is highly varied between the 'nearest' and 'farthest' processors, then the master could send out the next number in the algorithm without waiting for a response from the farthest processor.

Despite these advantages, I don't think the strategy can be generalized well. It only works because we have a 'reduce' operation which is simple. The master does not need to worry about which processor it shouts to throughout the duration of the algorithm, because a processor will know exactly when it's done (when master shouts a number greater than it).

Next time, here's a solution that I think will be fun to execute - everybody keeps shouting until they hear a number that is greater than theirs. Last person shouting has the max. Question is... how many processors can we shout to in real life?

sbyeon

@andymochi Assuming all the processors don't share the same cache and since the latency to access the memory is expensive, I think having too many communications between processors would be inefficient in the real world.

ericwang

About the third experiment, I think we don't really need a 'master' in algorithm. Actually, getting rid of the 'master' could make the process faster.
During the class, I found that the only reason we need a 'master' is for choosing a random student to yell his/her number. An easier way to decide who should yell without a 'master' is: We can let everyone keep an eye on the students who stand in the front lines or at their right side. Once no one stand in these areas, the student yells his/her number. Other rules are the same as what we did in the class. So the 'master' is not necessary. And the maximum number will keep standing at the end of this algorithm.
In this case, each student needs to track some other students' status. But it's not too hard for a person to do so. Maybe also true for some computer systems. And there is no one in the class needs to make random decision quickly, which could be difficult for an indecisive people. I think in this particular case, P2P system could perform better.
Also, in other similar computer system design problems, master node may lead to a less robust system with single node failure or performance bottleneck. Additionally, we will need to bother how to elect a master from the whole cluster.

nujicles

Although it would take approximately the same amount of time as the method used in class, this solution is theoretically more optimal because it doesn't use the master. However, this is not very applicable to computer systems because each processor would have to monitor a subset of total processors and be able to communicate between each other. This would cause a much bigger overhead as the number of processors increased, and it would be more efficient to use a master that sent the same message or signal to each processor. I can see that the choosing of a random student or processor can cause an extra overhead, but perhaps we can get rid of this by cleverly choosing a student at a certain position each round.