Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Why Parallelism

Previous | Next --- Slide 13 of 40

Back to Lecture Thumbnails

aznshodan

Can someone give me another example where fast does not always mean it's efficient?

sgbowen

@aznshodan I believe the notion of efficiency we're working with involves getting results as fast as possible without doing any wasted or unnecessary computation.

This might happen if there's overlapping work (and you have an inefficient algorithm), such as in lots of dynamic programming problems. Imagine trying to solve subset sum (given a set S of numbers and a number n, does there exist a subset T whose elements sum to n?) by calculating the sum of every subset of S. Of course you'd see a speedup if you added processors and distributed the calculations, as you'd be able to calculate more sums at the same time. But this solution is not efficient because you're repeating a lot of work and wasting cycles from your additional processors. For instance if I've already calculated the sum of the elements in {1,2} and I want to find the sum for {1,2,3}, it would be faster to add 3 to the calculation I already know for {1,2} than to add all the elements individually again.

I'd imagine this happens a lot when a sequential algorithm is lazily ported to a parallel algorithm, and the programmer doesn't give a lot of thought into how having more than one processor can dramatically change the solution space. (In which case, subset sum may not be a great example, since I don't know if there are any crafty ways to run the DP solution in parallel, but it's the first that came to my mind.)

Also, even if your algorithm is efficient, a 2x speedup with 10 processors (assuming all processors are working on the problem the whole time) may result in using 5 times as much processor power. So it's also worth considering: Is the 2x speed up worth having all those processors tied up and devoting more power to them?

JuneBot

@aznshodan The demo we did in class was a practical example of where fast did not translate to efficient, but I think a lot of it comes down to how you define "fast" and "efficient". If we measure "fast" by just how many seconds the addition in class took, the single-person computer took about 35 seconds. The two-person computer took about 25 seconds, and was faster. But if we measure "efficient" by the number of people-seconds (like man-hours), then the single-person computer used 35 people-seconds to compute, and the two-person computer took 50 people-seconds (2 people, each used 25 seconds). So the two-person computer was much less efficient in the computation.

By extension, if we choose to define "efficient" this way for all types of computations (although it's not really obvious why this is a good idea, aside from using a standard definition of efficiency), then any parallel computation on P processors that does not achieve on the order of 1/P speedup should be considered inefficient. This is not a sensible result, so we should really create some metric that balances speed and efficiency at its optimum value.

regi

As Professor Kayvon mentioned in class, it's important to keep in mind when to look for "fast" vs. "efficient" performance. For example, it would be reasonable to use many processors to triple the frame rate in a video game from 10 to 30 fps, because here speed is more important to gameplay than efficiency. On the other hand, mobile apps require more efficient parallelization (a fast program that uses many processors may use a lot more power quickly and drain the battery).

Faust

@regi I was a bit confused with your examples. I thought that efficiency was a measurement of how well one uses the hardware or resources given. Is the relationship between fast and efficient an either or relationship? Isn't it possible that a program can be both fast and efficient? For example let's say we get 3x speedup using 10 processors for a video game giving us 30fps instead of 10. Because we have increased the speed to a playable fps, Kayvon mentioned that this may actually be a good result. Isn't this an example of us using the hardware efficiently?

regi

@Faust I don't believe fast and efficient are mutually exclusive-- it's certainly possible a program is both fast and efficient. But whether or not the example (1/3 speedup with 10 processors) is efficient probably depends on your definition. Ideally we would be looking for 1/10 speedup with 10 processors, but that may be unreasonable as @JuneBot mentioned, so perhaps a different metric is needed here.

fgomezfr

@regi @Faust "Fast" and "efficient" are certainly not mutually exclusive, but they may be prioritized differently. Fast just means getting the job done quickly, but efficient means not wasting the resources available. If you divide the work among 10 processors and 7 of them go idle most of the time due to data dependencies, this is wasteful; maybe 3 processors could get the job done in the same time. Both implementations would be fast, but the second is more efficient, because it requires less resources, and you can use the remaining processors for a separate job.

I dislike the game example here for the same reason. Efficiency is actually very important in games, because it allows us to do more with the time available on the hardware available. Efficiency should be measured relative to the available resources; your example implies that a PC or console game can afford to be less efficient than a mobile game, which won't satisfy many developers (or players!).

As a game programmer, you likely would not get to choose how many processors your game will run on. Instead, you want to be as efficient as possible to achieve high speed on a target machine. If you were building the machine the game will run on as well as writing the code, you could choose to add more cores rather than make your code more efficient.

slowsoul

The "fast and efficient" notion reminds me of the MPI project in 15640, where the parallel program was run on different number of machines and the execution time is recorded for each setting. In a resource-constrained setting, an optimal marginal benefit may be preferred, where the most speedup is gained adding only one machine. Here, efficiency is the key factor. However, if there are enough resources, one may choose the minimum number of machines where the execution time starts to stabilize. Speed is prioritized but efficiency is also considered. In both real world scenarios above, fast is not directly translated to efficient, efficiency is always considered in some way.

afa4

Here, we say that fast does not imply efficient. I think that the other way around is true as well. That is to say, efficient does not imply fast. For ex.: we could have an algorithm that has maximum utilization of the resources but the communication overhead far exceeds the benefits of parallelization. In which case, how can a programmer write a program which hits the sweet spot in terms of speed and efficiency?

rmuthoo

What do we exactly mean by efficiency here? Is there even a definite metric to follow? If using 10 processors, we get speedup of 5 for some computation, and on adding another 10, speedup increases to just 5.2 , would the second case of using 20 processors be considered more efficient or just faster or does it depend on the importance of computation (like the example of stock trading mentioned in class) or the idle/blocked time of the processors also count?

xSherlock

@rmuthoo The above comments pretty well describe the differences between efficiency and fastness, but I think it's important to highlight that there is no exact definition for 'efficiency' or 'fastness', and it's even less clear which one is necessarily preferred. I think a very good analogy would be that of bandwidth vs. latency. Both of these are relevant to any network programming, but neither is inherently more important. Each problem domain will require careful consideration to figure out these two things should be prioritized.

In your example, the first 10 processors were more efficient than the system with 20 processors, but the system with 20 processors was faster. I don't think the metric necessarily takes into account algorithmic details like idle/blocked time or duplicated computation. It is a bird's eye perspective on metrics.

sanchuah

This reminds me that Google's distributed deep learning system takes 81 machines to achieve 12x speedup.