Previous | Next --- Slide 41 of 47
Back to Lecture Thumbnails
kayvonf

Note that if such instructions existed, the line float value = x[idx]; from our eariler sinx ISPC program could be implemented as a gather.

The line result[idx] = value; is a scatter.

mmp

Unfortunately, note that AVX2 only provides gather, not scatter.

Xiao

Even if both gather and scatter were supported in AVX2, they would still be significantly slower than contiguous blocks of load/store. This is why rearranging large data structures as much as possible to reduce gather and scatters result in noticeable speedups. In fact, the ISPC compile yells at you for using gather and scatters.

kayvonf

Challenge question: Why might an instruction that performs a gather be a more costly instruction than an instruction that simply loads a contiguous block of memory (in the case above: 16 contiguous values from memory)? For the computer architects in the class... what might be the challenges of implementing such an instruction?

tpassaro

Wouldn't a gather be more expensive because it is not a contiguous block of memory? The locations in memory could be much further apart from this, which would make caching the remaining values impossible. For example, if you have a cache line which can hold 16 values, and your data in memory is 17 address apart, you will get no cache hits because you needed to fetch the first value from data. Now, the remaining 15 values are useless to you. Now, not only do you waste time caching data which will be scrapped in the next iteration of the gather, but you also waste more time waiting for data to arrive from main memory as well.