Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 27 of 51

ggm8

This code shows how you would potentially nest blocking of different sizes such that you can optimize your locality in each hierarchy of cache (starting with the largest cache in the outermost loop and working your way down). You could have further optimization to account for locality within the size of your register as well. Essentially you want to access all the elements that you can fit in each cache before loading new elements into them, so that you minimize memory access latency.