For anyone else trying to understand the handwritten C + AVX intrinsics implementation, I found this reference guide very useful!
I found this to be pretty useful too.
I can't seem to find a reference for the reduce_add() function, but internally, does it try to parallelize the addition, or does it do a single pass like in the C - AVX implementation on the left?
reduce_add()
@funkysenior: https://ispc.github.io/ispc.html#reductions
In the (C U AVX) code, are we assuming 8 wide SIMD?
For anyone else trying to understand the handwritten C + AVX intrinsics implementation, I found this reference guide very useful!
I found this to be pretty useful too.
I can't seem to find a reference for the
reduce_add()
function, but internally, does it try to parallelize the addition, or does it do a single pass like in the C - AVX implementation on the left?@funkysenior: https://ispc.github.io/ispc.html#reductions
In the (C U AVX) code, are we assuming 8 wide SIMD?