Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Programming for Performance: Work Assignment

Previous | Next --- Slide 7 of 60

Back to Lecture Thumbnails

jcarchi

Can someone explain this slide to me? I was in the bathroom when this was explained.

byeongcp

Could someone explain why diff works with 3 elements and not 2 in this example? I think Kayvon mentioned it in class, but I didn't get it.

Also, would having 1 barrier have significant speedup than using 3 barriers (since each thread can still only work ahead 1 iteration at a time)?

Faust

Kayvon mentioned that it may be possible with two, but not for this example, because with 2 elements it is possible that on iteration 0 we would update diff[0] and check diff[0] while fast threads are updating diff[1]. The problem for 2 elements is that a fast thread on iteration 3 could clear the value in diff[0] before the we can check the diff[0] value from iteration 0. This is not a problem for 3 elements because the barrier will only let the fast thread get one step ahead. Thus, instead of overwriting the diff[0] value, it will write into diff[2] and we won't have any problems!

cube

@jcarchi He's creating 3 different diff values that we cycle through updating (that's why it's an array now).

The reason that this works is because the single global diff variable was really the only cause for us to have 3 separate barriers. So, if we set it up so that for a given iteration, the previous iteration and the next iteration use different diff variables (which one we update is given by index), then we don't have to keep 3 barriers around, because there's no conflicts when updating/checking/resetting the separate diff variables. See @Faust's explanation for why this wouldn't work with only 2 diff variables.