We recompute edge values of each block since we don't waste time communicating, but since the block size is so large (256x32), it's a small tradeoff for a huge increase in parallel performance.


