Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Programming for Performance: Work Assignment

Previous | Next --- Slide 16 of 60

jazzbass

Here is the coded solution that was proposed in class to avoid deadlocks on the initial sends and recvs. The idea is that the workers with even tid's do the send's first and the recv's later, and the workers with the odd tid's do the recv's first and the send later:


if(tid % 2 == 0) {
  //do sends first, then recvs
  if (tid != 0) 
    send(&localA;[1,0], sizeof(float)*(N+2), tid-1, MSG_ID_ROW);
  if (tid != get_num_threads()-1) 
    send(&localA;[rows_per_thread-2,0], sizeof(float)*(N+2),tid+1, MSG_ID_ROW); 
 
  if (tid != 0) 
    recv(&localA;[0,0], sizeof(float)*(N+2), tid-1, MSG_ID_ROW); 
  if (tid != get_num_threads()-1) 
    recv(&localA;[rows_per_thread-1,0], sizeof(float)*(N+2), tid+1, MSG_ID_ROW); 
}
else {
  //do recvs first, then sends
  recv(&localA;[0,0], sizeof(float)*(N+2), tid-1, MSG_ID_ROW); //We know that tid != 0
   if (tid != get_num_threads()-1) 
     recv(&localA;[rows_per_thread-1,0], sizeof(float)*(N+2), tid+1, MSG_ID_ROW); 
 
  send(&localA;[1,0], sizeof(float)*(N+2), tid-1, MSG_ID_ROW); //We know that tid != 0
  if (tid != get_num_threads()-1) 
    send(&localA;[rows_per_thread-2,0], sizeof(float)*(N+2),tid+1, MSG_ID_ROW); 
}

cacay

There are type systems that statically forbid deadlocks (see https://www.cs.cmu.edu/~fp/papers/concur10.pdf for example).

Types become even more useful in a concurrent setting since it is almost impossible to test for deadlock. Code that works on one machine could fail on another. A proper language should not burden the programmer with this task.

jiajunbl

@jazzbass I think there may be a subtle thing to note in the code you presented is that given a unbuffered communication for send/recv is that it is actually sequential.

For the % 2 == 0 case, the code sends to tid-1 send(&localA;[1,0], sizeof(float)*(N+2), tid-1, MSG_ID_ROW);

But for the %2 == 1 case, the code recvs first from tid-1 and recvs from tid+1.

This means that there is a circular wait on all threads except the first thread. Therefore, all processes/threads will be blocked until thread 0 sends, and there is only 1 sequence of send/recvs that can be fulfilled at one time. I think having a recv to tid + 1 then tid - 1 may make the code a bit more parallel.

This could be easily fixed also if we assume a buffered channel of size 1.

jazzbass

@jiajunbl You're right, nice catch! The same thing happens with the second set of send/recv's. Here's the updated code:


if(tid % 2 == 0) {
  //do sends first, then recvs
  if (tid != 0) 
    send(&localA;[1,0], sizeof(float)*(N+2), tid-1, MSG_ID_ROW);
  if (tid != get_num_threads()-1) 
    send(&localA;[rows_per_thread-2,0], sizeof(float)*(N+2),tid+1, MSG_ID_ROW); 
 
  if (tid != 0) 
    recv(&localA;[0,0], sizeof(float)*(N+2), tid-1, MSG_ID_ROW); 
  if (tid != get_num_threads()-1) 
    recv(&localA;[rows_per_thread-1,0], sizeof(float)*(N+2), tid+1, MSG_ID_ROW); 
}
else {
  //do recvs first, then sends
  if (tid != get_num_threads()-1) 
    recv(&localA;[rows_per_thread-1,0], sizeof(float)*(N+2), tid+1, MSG_ID_ROW); 
  recv(&localA;[0,0], sizeof(float)*(N+2), tid-1, MSG_ID_ROW); //We know that tid != 0
 
  if (tid != get_num_threads()-1) 
    send(&localA;[rows_per_thread-2,0], sizeof(float)*(N+2),tid+1, MSG_ID_ROW); 
  send(&localA;[1,0], sizeof(float)*(N+2), tid-1, MSG_ID_ROW); //We know that tid != 0
}