Slide View : Parallel Computer Architecture and Programming : 15-418/618 Fall 2016

kgdmKZ

Expanding on these observations: Only 5 (the number of rows of seating in the lecture room) people were performing any computation at a given time, and so almost everyone was idle. Most people individually performed only a second or two of computation over the entire time. If the problem was something more difficult so that each person would spend significantly more time doing an individual subproblem, the resulting speedup would probably improve since computation costs would dominate. The maximum possible speedup would still be restricted if only one person per row performs a computation at a time, and there would still be communication overhead.

anamdev

While doing these demos, I noticed there were quite a few ideas floating around for parallelizing the computations. I found this interesting because we really have to think when choosing an optimal solution for a specific situation. I hope to learn and use different methods for parallelizing code in this class.

taichia

What would have been an efficient way to count the ages given the constraint we had?

EggyLv999

One way to improve would be to propagate the values toward the center instead of to one edge; this would allow twice as many people in each row to be adding at once.

Iamme

Alternatively, we could have added by column instead of by row since there are more columns. Algorithmically this is barely different, so it doesn't apply super well to all cases, but in this case it would have allowed about 4 times as many "processors" to be used at once. Of course, then the addition of the resulting row of numbers would have taken longer, but maybe we could apply EggyLv999's suggestion at that point.

ant

If we assume that all of us perform addition at the same rate (synonymous to equal clock speed and capability processors), then after an addition performed by each active student (each clock cycle of the equal clock speed processors), we'd be one step closer to the answer.

Ideally, we'd want to minimize the number of such steps. For a binary operation such as addition, the best way to do this would be to halve the number of students in each step, if I'm not wrong.

Interestingly, as in our classroom example, the stored results after each step of such a logarithmic time algorithm seem to move away from each other. This could pose a problem in some memory architectures. :)

Skyliner

Adding on lamme's suggestion, adding by column instead of by row wouldn't make a huge difference if you need to add the results of all column in the last row one by one, because in this case the Span is still the same as adding by row first.

Since the addition is associative, we will be able to parallel the last process. ie adding up the sum of each column in the last row. I would suggest using Divide and Conquer. Basically the same idea of MergeSort. However, this requires some overhead work to inform whom should talk to whom.

aravisha

I wonder if there is a way to accurately calculate what the optimum strategy or optimum number of cores would be to complete a particular task? i.e. would we be able to calculate exactly how many people in the class should have been used to calculate the total age?

Iamme

Skyliner, good point about the span. The problem I see with Divide and Conquer is communication. Remember we can only talk to the people next to us. So there would be a lot of communication overhead just passing sums down the line.