Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 37 of 48

Fantasy

I think the reduceAdd in previous slide allows freedom for implementation to deal with the sum to diff more intelligently. Maybe there will be temporary partial sum for each working thread and then sum them to diff at the very end. Here by using a lock, all the sum will go to shared variable diff immediately in the loop and all working threads become sequential at that point. Following loops cannot start unless the sum of current loop is done. It will potentially be a huge performance bottleneck.

kayvonf

@Fantasy. Good observation. I like that comment. Yes, a good implementation of reduceAdd from the data-parallel formulation would likely implement the reduction efficiently by accumulating into local partial sums like we do in the more efficient version of this code shown later in this lecture on slide 41.