Previous | Next --- Slide 31 of 69
Back to Lecture Thumbnails

For anyone else trying to understand the handwritten C + AVX intrinsics implementation, I found this reference guide very useful!


I found this to be pretty useful too.


I can't seem to find a reference for the reduce_add() function, but internally, does it try to parallelize the addition, or does it do a single pass like in the C - AVX implementation on the left?


In the (C U AVX) code, are we assuming 8 wide SIMD?