Why: Say the two machines/threads/etc (I'll say threads here) for a pair of adjacent blocks of rows each call SEND to each other at the same time. Each SEND would then wait for the other thread to call RECEIVE, which in turn cannot happen because it is waiting for SEND to complete. So neither thread can call RECEIVE to un-block the other thread. So there is a chance of this deadlocking.
Fix: One simple way to fix this would be to impose an ordering on communication between threads. For example, perhaps each thread must wait for the thread "above" it (the thread responsible for the blocks further along the Y axis of the grid) to send it a message, by RECEIVEing first. The top thread would not need to wait for any others, so eventually this linear chain of dependencies would complete. We can then repeat the process in the opposite direction.
This has the unfortunate side effect of serializing all communication, although the computation is still parallel. We could probably design more complex schemes that would reduce the serialization.
This comment was marked helpful 0 times.
Another fix suggested in class was to have all of the threads with even PIDs wait to RECEIVE from the ones with odd PIDs to SEND, and then switch around so that odd PIDs wait for even ones. This parallelizes the communication, instead of doing it serially.