Previous | Next --- Slide 33 of 47
Back to Lecture Thumbnails
scedarbaum

Are there any good heuristics for how much "blocking" one should do (i.e. does it ever make sense to use more than 4D blocking)? Ideally you'd know the full details of the cache environment you're working with and could derive it from that but, in practice, you're likely deploying to a large number of machines with different hardware configurations.

lament

It is worth noting that the algorithm on the right required the data to be stored in memory in an a-typical fashion, which was achieved by appropriate reindexing. Thus, unless you were handed the data in this format (requiring the person handling the data initially to know how you were distributing it among threads), you must consider the overhead of the conversion process. That overhead, hopefully, would be amortized out by saving many main memory accesses later.