Previous | Next --- Slide 30 of 51
Back to Lecture Thumbnails
apadwekar

We now can take advantage of memory locality in A, B, and C instead of simply A and B in the previous slide.

mario

If written well, matrix multiplication can have the arithmetic intensity of O(block size).

kapalani

There are several optimized linear algebra libraries for pretty much every platform. Examples: cuBLAS, Eigen, LAPACK, Intel MKL, Armadillo, etc.

themj

For both A and B, you are loading in an entire vector of size SIMD_WIDTH allowing you to take advantage of SIMD-level parallelism. Why do you get SIMD_WIDTH*SIMD_WIDTH parallelism for C?