With separate foo and bar functions, tmp is stored to and then loaded from memory (unnecessary bandwidth usage). The compiler may detect this and optimize by just storing tmp in a buffer, e.g. register.
stride16
The code provided on the slide only reads from memory once, written as:
output[i] = bar(foo(input[x]));
This code can be written in several different ways, which can lead it to being misinterpreted.
With separate
foo
andbar
functions,tmp
is stored to and then loaded from memory (unnecessary bandwidth usage). The compiler may detect this and optimize by just storing tmp in a buffer, e.g. register.The code provided on the slide only reads from memory once, written as:
output[i] = bar(foo(input[x]));
This code can be written in several different ways, which can lead it to being misinterpreted.