Previous | Next --- Slide 27 of 41
Back to Lecture Thumbnails
ZhuansunXt

Last three slides show three different way to apply SIMD inside a block: vectorizing innermost loop, intermediate loop and the outermost loop. I wonder if the only criterion of which way is the best way is the shape of the block we use to do the multiplication? For example in this slide, if i dimension is small, we'd better vectorize the innermost loop by transposing B.