Previous | Next --- Slide 32 of 46
Back to Lecture Thumbnails
ramen_bar_delivery

In this example, if we only executed two instructions per cycle we could still finish the computation in three cycles:

Cycle 1: x*x and y*y

Cycle 2: (x*x) + (y*y) and z*z

Cycle 3: (x*x + y*y) + (z*z)

This, along with the next slide, demonstrates that being able to execute more instructions per cycle will not guarantee a speedup because the vast majority of programs have dependencies that would not benefit from parallelism.

tarabyte

In 15-210, we talked a lot about work and span. This slide reminds me a lot of those discussions. We defined span as the longest series of dependent operations. It seems that even if we could execute infinite instructions at a time, in this case, the runtime would still be limited by the span, which is 3 clock cycles (as @ramen_bar_delivery mentioned.) Will we focus a lot on work and span in this class as well?

locked

Although programmers expect the program to execute one instruction at a time, it would be really slow this way because read and store in memory is expensive. In order to hide memory latency, processor reorder instructions. For example, if this program is performing a memory read and hasn't gotten back the answer, the processor can execute some other independent instructions while waiting for the answer.

chenboy

From this example, I notice that if we want to perform tasks like $a = b * b + c * c + d * d + ...$ where there are $N(N > 1)$ groups of variables in the right-hand side of the equation we can get the result in at least $1 + ceiling(logN)$ rounds.