Slide View : 15-418 Spring 2013

Previous | Next --- Slide 7 of 49

monster

The black bar in the graph means the time waiting for the data that comes from other threads. The 4D blocking of grid reduces communication since it has less elements on the border than 2D block pattern.

This comment was marked helpful 0 times.

stephyeung

By plotting these graphs, we are able to see how to best optimize our code next. The crosshatched bars are all horizontally level in both graphs, meaning our workload assignments are sufficient. However, we see the communication bars are significantly higher in the graph on the right compared to the left, so we could try to capture more locality.

This comment was marked helpful 1 times.

chaominy

SGI Origin 2000 is a family of mid-range and high-end servers manufactured by SGI. The largest installation was ASCI (Accelerated Strategic Computing Initiative) Blue Mountain at Los Alamos National Labs, with 48 Origin 2000 series 128-CPU systems all connected via HIPPI for a total of 6144 processors. See more here.

This comment was marked helpful 0 times.

TeBoring

4D layout is different from 4D assignment.

4D layout is about the layout of data (to decrease artificial communication in cache)

4D assignment is about work assignment among threads (to decrease inherent communication among neighboring threads)

This comment was marked helpful 0 times.

TeBoring

I have a guess for why synchronization on the right is longer.

First, 2D block assignment actually doesn't allocate same amount of work to each thread. (this can be seen from the bottom bar)

Second, due to unbalance of work, barriers need more time (to wait for the last). Thus, synchronization is longer.

One more interesting thing I found is, even though work is unbalanced in 2D assignment, the finish time still the same. I guess this is because there is barrier there.

Correct me if I am wrong

This comment was marked helpful 0 times.

kayvonf

@TeBoring: Yes, I think it's safe to assume that this example is running over many solver iterations (and perhaps also over many time steps), and there's a barrier at the end of each phase. That's why the total height of the bars is the same. Any imbalance in communication costs or processing gets added to the data wait or sync wait bars so the total is always the same.

It think that increases sync time in the 2D is actually due to the imbalance in both compute (busy) and also communication costs. Does how there's even more variation in the black bars than the hatched bars. Both of these effects could increase the time waiting to sync all the threads up at barriers.

This comment was marked helpful 0 times.