Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Perf Optimization II: Locality, Communication, & Contention

Previous | Next --- Slide 49 of 70

TomoA

In the top code, for each instruction we load our two arrays into memory and then write into a third array, repeated 4 times. In the bottom code, we completely skip the third array for the first two instructions, so we only do one write after doing the instructions. The advantage of the bottom code is that we don't have intermediate temporary arrays and we do all instructions at once, writing only one time. Before, we had 3 writes and 6 reads for 3 instructions, hence 1 instruction per 3 memory operation. After, we have 4 reads and one write for 3 instructions.

Lawliet

Also, we are more likely to have a cache hit in the bottom code. In the top code we have to miss at the beginning of the mult and the second add because we just iterated through the array. This means that the last elements are most likely in the cache rather than the first few elements.

althalus

Does this mean that there is some kind of direct relation between arithmetic intensity and cache hits? Or are they used for different things? Since a higher arithmetic intensity leads to a higher cache hit rate.

yangwu

@althalus i think the definition of arithmetic intensity is computation / communication. so given more cache hits, there would be less communication cost