In the top code, for each instruction we load our two arrays into memory and then write into a third array, repeated 4 times. In the bottom code, we completely skip the third array for the first two instructions, so we only do one write after doing the instructions. The advantage of the bottom code is that we don't have intermediate temporary arrays and we do all instructions at once, writing only one time. Before, we had 3 writes and 6 reads for 3 instructions, hence 1 instruction per 3 memory operation. After, we have 4 reads and one write for 3 instructions.
Also, we are more likely to have a cache hit in the bottom code. In the top code we have to miss at the beginning of the mult and the second add because we just iterated through the array. This means that the last elements are most likely in the cache rather than the first few elements.
Does this mean that there is some kind of direct relation between arithmetic intensity and cache hits?
Or are they used for different things?
Since a higher arithmetic intensity leads to a higher cache hit rate.
@althalus i think the definition of arithmetic intensity is computation / communication. so given more cache hits, there would be less communication cost