In this example, if we only executed two instructions per cycle we could still finish the computation in three cycles:
Cycle 1: x*x and y*y
Cycle 2: (x*x) + (y*y) and z*z
Cycle 3: (x*x + y*y) + (z*z)
This, along with the next slide, demonstrates that being able to execute more instructions per cycle will not guarantee a speedup because the vast majority of programs have dependencies that would not benefit from parallelism.
tarabyte
In 15-210, we talked a lot about work and span. This slide reminds me a lot of those discussions. We defined span as the longest series of dependent operations. It seems that even if we could execute infinite instructions at a time, in this case, the runtime would still be limited by the span, which is 3 clock cycles (as @ramen_bar_delivery mentioned.) Will we focus a lot on work and span in this class as well?
locked
Although programmers expect the program to execute one instruction at a time, it would be really slow this way because read and store in memory is expensive. In order to hide memory latency, processor reorder instructions. For example, if this program is performing a memory read and hasn't gotten back the answer, the processor can execute some other independent instructions while waiting for the answer.
chenboy
From this example, I notice that if we want to perform tasks like $a = b * b + c * c + d * d + ...$ where there are $N(N > 1)$ groups of variables in the right-hand side of the equation we can get the result in at least $1 + ceiling(logN)$ rounds.
In this example, if we only executed two instructions per cycle we could still finish the computation in three cycles:
Cycle 1:
x*x
andy*y
Cycle 2:
(x*x) + (y*y)
andz*z
Cycle 3:
(x*x + y*y) + (z*z)
This, along with the next slide, demonstrates that being able to execute more instructions per cycle will not guarantee a speedup because the vast majority of programs have dependencies that would not benefit from parallelism.
In 15-210, we talked a lot about work and span. This slide reminds me a lot of those discussions. We defined span as the longest series of dependent operations. It seems that even if we could execute infinite instructions at a time, in this case, the runtime would still be limited by the span, which is 3 clock cycles (as @ramen_bar_delivery mentioned.) Will we focus a lot on work and span in this class as well?
Although programmers expect the program to execute one instruction at a time, it would be really slow this way because read and store in memory is expensive. In order to hide memory latency, processor reorder instructions. For example, if this program is performing a memory read and hasn't gotten back the answer, the processor can execute some other independent instructions while waiting for the answer.
From this example, I notice that if we want to perform tasks like $a = b * b + c * c + d * d + ...$ where there are $N(N > 1)$ groups of variables in the right-hand side of the equation we can get the result in at least $1 + ceiling(logN)$ rounds.