I don't quite understand here. What is the iteration order? why three lines for every eight elements?
Cake
I think what this slide is saying is, given that the elements traversed by the red lines have already been calculated, each new row of 8 elements on requires on average 3 new cache lines to be loaded (specifically the 3 lines corresponding to the row directly beneath the 8 elements to be calculated).
Supposing the 4th row's last 8 elements were to be calculated next. Then we can assume from previous accesses that the 6 cache lines currently in the cache are those that correspond to the 3rd and 4th rows. However, these 8 elements need information from the next row to be calculated, so this necessitates the 3 lines below to be brought into the cache (but not until after reading the required from the 3rd row!).
As is common with many other slides in this lecture, I think the common theme here is, once you've successfully set up some initial conditions, you can achieve a better "average" than the naive implementation.
I don't quite understand here. What is the iteration order? why three lines for every eight elements?
I think what this slide is saying is, given that the elements traversed by the red lines have already been calculated, each new row of 8 elements on requires on average 3 new cache lines to be loaded (specifically the 3 lines corresponding to the row directly beneath the 8 elements to be calculated).
Supposing the 4th row's last 8 elements were to be calculated next. Then we can assume from previous accesses that the 6 cache lines currently in the cache are those that correspond to the 3rd and 4th rows. However, these 8 elements need information from the next row to be calculated, so this necessitates the 3 lines below to be brought into the cache (but not until after reading the required from the 3rd row!).
As is common with many other slides in this lecture, I think the common theme here is, once you've successfully set up some initial conditions, you can achieve a better "average" than the naive implementation.