Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Performance Optimization II: Locality, Communication, & Contention

Previous | Next --- Slide 41 of 72

bochet

I think at the end of the class, Kayvon mentioned one way to avoid cold miss, but I don't quite get it. Can one explain this?

Also communication miss refers to the fact even though the data might be cached in one machine, but it's needed in the other machine, so communication is required.

-o4

@bochet From what I heard, the way to reduce cold miss is to dedicate a thread to fetch all the necessary data within a thread block so that other threads in that block can access data directly from cache. Please correct me if I heard wrong! Welcome more details!

l8b

@-o4 I believe that's what I heard as well. By using a separate thread to directly fetch the data for the cache in parallel, we can hide the latency of the cold miss, but since we use another thread to do this, it's only possible in parallel programs and is why cold misses are unavoidable in sequential programs.

machine6

I believe this technique is called prefetching.

yes

Fetching all data within a thread block could help reduce cold misses, but wouldn't that potentially result in additional capacity misses if the cache simply isn't big enough?

ayy_lmao

I am unsure why prefetching is beneficial. It seems like the other threads in the block would be stuck waiting for the thread to load the data (which would take a long time, as all of its accesses are misses?)

paracon

I think Prof. Kavyon was talking about Hardware Prefetching