Previous | Next --- Slide 33 of 46
Back to Lecture Thumbnails
khans

After a certain number of processors, we reach a point where adding a processor doesn't provide significant speedup. So we ask, is it worth it to add a processor? What else can we do?

lilli

@khans: Actually this graph doesn't show adding processors, rather, increasing the number of instructions a single processor can process at a single clock.

There are few groups of instructions in a program larger than 4 or 5 that can be processed together, because of dependency between instructions. After this number, marginal benefit of being able to process more instructions at a time drops off.

planteurJMTLG

This graph comes from this paper: Culler, David E., Jaswinder Pal Singh, and Anoop Gupta, Parallel computer architecture: a hardware/software approach, 1999

It is interesting to note that the data doesn't come from a program executed on a real machine, but rather a simulation on "an ideal machine with unlimited instruction fetch bandwidth, as many functions units as the program can use, and perfect branch prediction". Even with that, "90% of the time no more than four instructions issue in a cycle". So the speedup increase must be even worse than this in real conditions.

rrp123

From my understanding, this graph shows the amount of speedup that we get on the y axis and the number of instructions that the processor is capable of executing in parallel on the x axis. This is because the likelihood of a large number of instructions (5 or more) being executed in parallel at the assembly level in parallel is very low. Most programs generally have 4 or less instructions that can be executed in parallel at any point in time, due to dependencies. Thus, making processors that are capable of running more than 4 instructions in parallel at a time doesn't give us very much more speedup, but drastically increases the cost. So it isn't worth making such highly parallel processors (capable of running more than 5 or more instructions in parallel at the same time)

Abandon

I am wondering that if the speedup brought out from processor's instruction issue capability is related to the program is runs. Since the more independent instructions exists, the more speedup it can achieve. Meanwhile, programs containing multiplication combined with addition will be more efficient compared with programs with only multiplication or addition. Since normally a processor have different computing unit for different computation instruction. So I am not sure it is ok for us to not consider the factor of the program content when drawing this figure.

pk267

Is this graph for a single program? What if we have a number of programs loaded in memory using the same processor? These programs would be mostly independent from one another. Each one would also have independent instructions within them. In that case, this number(4) - beyond which there is no speedup - seems really low.

slanka

@Abandon For ILP, the program content as you mention does not actually factor in. For the superscalar processor we just added a duplicated second ALU (each ALU can do all arithmetic calculations), so one program with only multiplication vs a program with multiplication and addition do not actually differ. The only thing that effects how many instructions can run in parallel is the dependencies between the different lines of instructions.