Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Performance Optimization II: Locality, Communication, & Contention

Previous | Next --- Slide 59 of 72

bochet

I am trying to understand tree structured communication.

Is distributed working queue one example of tree structured comm? Based on my understanding, tree structure means consulting local queue first before visiting global queue.

axiao

From my understanding, tree structured communication is a way to achieve the same result as flat communication but by distributing the communication across different nodes or threads, which reduces contention.

For example, suppose each thread has a number and we need to add all their numbers together. One way to do it is with flat communication: use a global variable and have all threads add their number to it. This will require a lot of synchronization since all the threads access the same variable (contention).

One way to reduce the contention is to just use more global variables, assign each thread to add their number to a specific global variable, and then sum the global variables together. We can take this idea further: pair up each thread, and have one thread add their 2 numbers together. Then pair up the threads that have added 2 numbers and repeat (basically just the reduce operation). This is an example of tree structured communication and there is no contention since no 2 threads are ever trying to access the same resource/variable.

pagerank

The pattern of the right graph is called AllReduce in some parallel computing frameworks. It is first implemented in a high-performance computing library MPI, and it is also implemented in Vowpal Wabbit by John Langford in a Hadoop way.

rav

Tree structure communication would have something like O(logN) times more latency than flat communication when we have no contention. But if we have high contention we would have only 2 contentions ( or outdegree) at any given node.

rrp123

I understand that we are likely to implement the tree structured communication in situations of high contention, but what if we don't know whether there will be contention or not in advance? Would it be better to just implement flat communication in that case, since we would have less latency in cases of less contention?