Previous | Next --- Slide 5 of 35
Back to Lecture Thumbnails
gbarboza

The point to be made here is that testing the parallel version of an algorithm against itself on 1-core vs many-cores can be misleading in that the parallel version on 1-core is nowhere near as fast as an optimized sequential version of the algorithm on 1-core.

Arnie

The different implementations also display the importance of our performance metric. For example, if our goal was to minimize contention, we would choose implementation 4. But, this implementation also does a lot more work, so if we wanted to instead have less work, we might choose implementation 3 (which has a larger contention problem).

martin

Usually an optimized sequential implementation is most efficient in the amount of work it is doing, although the parallel version of an algorithm may still be faster. And the reason here is that the parallel version achieves better parallelism (it does more work at once) which makes up for the more work it needs to do. And the more cores the machine has the algorithm can achieve greater speedup. Therefore, comparing parallel program speedup to parallel program on one core doesn't reveal the true result.