Previous | Next --- Slide 1 of 65
Back to Lecture Thumbnails
arsenal
  1. Single-instruction-stream performance topped out because the optimizations have gotten better and better over time, to the point where no significant improvement is really possible. In instruction sets where many instructions are dependent on previous instructions, not much of the code can actually be optimized by trying to find instruction-level parallelism. Even with compilers trying to find parallelism in more clever ways, like searching for instructions in different parts of a program that can be executed in parallel, it's gotten to the point where improving ILP requires a lot of effort for little gain.

  2. Two of the main problems were (1) communication overhead and (2) uneven work distribution. When adding a set of numbers together, processors can compute components of the problem in parallel, but ultimately must combine their components to generate the final sum. This requires them to communicate their results, and there are costs associated with communication between processors. If the communication to computation ratio for each processor is high, then parallelism can be no more efficient or even less efficient than a sequential addition. In addition, when a program is run in parallel, the speed is limited by the processor with the longest computation. Therefore, if one processor has been assigned most of the work, then all the other processors have to wait for that one to finish before their results can be joined together. If one processor is assigned 90% of the original work, for example, then the parallel version of the program will be achieve at most 1.11x speedup or so, not factoring in other potential costs of parallelism. So, uneven distributions of work among the processors also prevents maximum speedup.

kayvonf

@arsenal: Good! But one clarification. You meant to say instruction streams, not instruction sets!