Every thread needs two sends and two receives. In order to maximize parallelism, one way I can think of is:
the odd index threads send data "forward" (i->i+1), and the even index threads receive these data (i<-i-1).
the odd index threads send data "backward" (i->i-1), and the even index threads receive these data (i<-i+1).
step 3, 4: vice versa
Every thread needs two sends and two receives. In order to maximize parallelism, one way I can think of is:
the odd index threads send data "forward" (i->i+1), and the even index threads receive these data (i<-i-1).
the odd index threads send data "backward" (i->i-1), and the even index threads receive these data (i<-i+1).
step 3, 4: vice versa