Slide View : Parallel Computer Architecture and Programming : 15-418/618 Fall 2016

Previous | Next --- Slide 19 of 69

tclarke

Correct me if I'm wrong, but you would want as large of a block size as possible while it is still able to be stored in your cache, ie you would want (BLOCKSIZE_I^2 + BLOCKSIZE_J^2 + BLOCKSIZE_K^2) * sizeof(int) to be able to fit in your L1 cache. You might get away with only one line of the A subblock and the C subblock fitting at a time since once a row is done with these matrices, it is not revisited. But I am not sure if that specific row would be evicted from your L1 cache if all you had room for was one row from each of those subblocks.

ferozenaina

I think we want to to chose the largest block sizes so that all the three sub-blocks of A, B and C fit in the cache. The L1 cache in Core i7 stores 64KB per core - that is around 2000 integers. That should be sufficient. I agree with the size = (BLOCKSIZE_I^2 + BLOCKSIZE_J^2 + BLOCKSIZE_K^2) * sizeof(int).

I'm not able to understand when you say "But I am not sure if that specific row would be evicted from your L1 cache if all you had room for was one row from each of those subblocks." We would also be storing only one column of the subblock of B at a time in the cache.