Previous | Next --- Slide 31 of 69
Back to Lecture Thumbnails
afzhang

For anyone else trying to understand the handwritten C + AVX intrinsics implementation, I found this reference guide very useful!

ChandlerBing

I found this to be pretty useful too.

funkysenior15

I can't seem to find a reference for the reduce_add() function, but internally, does it try to parallelize the addition, or does it do a single pass like in the C - AVX implementation on the left?

lament

In the (C U AVX) code, are we assuming 8 wide SIMD?