Slide View : 15-418 Spring 2013

Previous | Next --- Slide 9 of 36

Thedrick

In demo 3, Kayvon asked us to figure out how many computer science courses we were all taking (count the number of 15-XXX courses taken by all students in 15-418). Without any specific direction, it seemed that the majority of the class tried to get a number for his or her row, and then combine with the rest of the rows in the room. This worked for individual rows, but the combining steps were messy and it was pretty unclear as to which rows had already been added when combining larger sections. This was one of the best 'real world' examples I have seen to explain communication costs between threads / processes. This also demonstrated how difficult it can be sometimes to determine the best way to write the 'combine step' in a parallel program.

This comment was marked helpful 5 times.

jpaulson

Also: A central coordinator or at least an agreed plan-of-action would have helped a ton.

This comment was marked helpful 0 times.

asheng

Another interesting point to consider is that some of the students were not paying attention and ended up taking a while to answer the question.

From a parallel computing perspective, this is equivalent to some subunit taking much longer than expected to complete computation on its workload. Therefore, an important thing to keep in mind for parallel computing is load balancing - there may exist data sets for which initially assigning an apparently "equal" amount of work to each sub-unit is not the fastest or most efficient way to complete the task. That is, either more analysis or some degree of dynamism may be required.

This comment was marked helpful 6 times.

xiaowend

Also, we observe that in the process of getting the sum of a single row, most students tried to summing from both the leftmost and the rightmost, and then the students in the middle can add the two parts up. And in the combining steps, the last combining step also happens in the middle of class. From computing perspective, this is like the case that some machines are connected "closely" and the communications between them are quick while some would take a long time to communicate.(Like Demo 2 in class). Then, sometimes we might make use of the particular style of networks to design efficient algorithms.

This comment was marked helpful 3 times.

acappiello

Yes, the lack of organization definitely cost quite a bit of time. Our programs won't have to "figure out" how to combine data, since that would have to be explicitly managed by the code. So, at least there's some variance in human vs. cpu communication overhead, but serves to make the same point.

The posts above raise many good points. Even if a program is written to take "perfect" advantage of the hardware it's being run on, there are still outside factors that would detract from the performance. In particular, rarely is just one thing running on a computer. To make an extreme example, suppose a program is optimized to run on 4 cores (e.x. spawn 4 threads each with 1/4 of the work), but another process is already fully utilizing one core. So, 3 run immediately and the 4th has to wait.

This comment was marked helpful 2 times.