Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 14 of 59

rokislt10

It seems that the synchronization time for the 4D blocked layout is also appreciably less than the 2D blocked layout. Is there a reason for this? Is there some inherent property that will link less data transfer time with less synch time?

ekr

I think this was brought up in class. The majority of synchronization time is due to some threads waiting for other threads to finish at barriers. The 4D Blocked layout reduces contention, and so there's less time spent waiting at barriers for other threads to catch up.

kayvonf

@ekr. Correct... except, it reduces communication, not contention.

ekr

Oops, I meant communication between processors. The artifactual communication is reduced because of the rearrangement, so the cache line that is communicated doesn't waste as much.

byeongcp

It seems like in 2D blocked layout, the processors spend as much time on waiting for the data as it uses those data for computation. Clearly, if we knew that we were going to use the static grid assignment, we would prefer the 4D blocked layout. Is it possible to choose either layout (i.e. programmer needs to tell GPUs to use 4D layout over 2D) or are GPUs smart enough to figure out the memory layout on its own?

HLAHat

I believe the programmers would have to set up the memory system and tell the processor what to do. I don't think the "rules" for choosing a memory system are well-defined enough to have it be automatic.