Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Performance Optimization II: Locality, Communication, & Contention

Previous | Next --- Slide 48 of 72

Back to Lecture Thumbnails

chenh1

I don't quite understand here. What is the iteration order? why three lines for every eight elements?

Cake

I think what this slide is saying is, given that the elements traversed by the red lines have already been calculated, each new row of 8 elements on requires on average 3 new cache lines to be loaded (specifically the 3 lines corresponding to the row directly beneath the 8 elements to be calculated).

Supposing the 4th row's last 8 elements were to be calculated next. Then we can assume from previous accesses that the 6 cache lines currently in the cache are those that correspond to the 3rd and 4th rows. However, these 8 elements need information from the next row to be calculated, so this necessitates the 3 lines below to be brought into the cache (but not until after reading the required from the 3rd row!).

As is common with many other slides in this lecture, I think the common theme here is, once you've successfully set up some initial conditions, you can achieve a better "average" than the naive implementation.