Previous | Next --- Slide 22 of 40
Back to Lecture Thumbnails
jedavis

One thing which was unclear to me from just this slide alone during lecture was the access pattern of the program under discussion. Knowing that the program accesses the rows both above and below the row it's currently working on is important for understanding this slide's point about blocking for temporal locality (since cutting it into blocks of four, as on the right side, lets you re-use the current row and row below it during the next iteration, which gets you two cache hits instead of two misses later if you use the left-hand row-major strategy).

bourne

Using the technique on the right will decrease capacity misses because it can load an entire area in the z formation and complete the calculation for that area, so only the elements on the border will need to be used again. This does assume that the grid is large enough that if you finish one row and go the the next, that previous row will probably not be in cache any more (because of a capacity miss)