Why is it bad to have the working set increased by SIMD_WIDTH?
Firephinx
@POTUS I think it is because of the inner for loop where you need to get BLOCKSIZE_K different A values one at a time and run the SIMD vector operation simd_muladd for each value of A individually instead of just doing dot products with SIMD_WIDTH A values at a time on the next slide. (It's just more unnecessary work and operations.)
Why is it bad to have the working set increased by SIMD_WIDTH?
@POTUS I think it is because of the inner for loop where you need to get BLOCKSIZE_K different A values one at a time and run the SIMD vector operation simd_muladd for each value of A individually instead of just doing dot products with SIMD_WIDTH A values at a time on the next slide. (It's just more unnecessary work and operations.)