Can someone explain this slide to me? I was in the bathroom when this was explained.
byeongcp
Could someone explain why diff works with 3 elements and not 2 in this example? I think Kayvon mentioned it in class, but I didn't get it.
Also, would having 1 barrier have significant speedup than using 3 barriers (since each thread can still only work ahead 1 iteration at a time)?
Faust
Kayvon mentioned that it may be possible with two, but not for this example, because with 2 elements it is possible that on iteration 0 we would update diff[0] and check diff[0] while fast threads are updating diff[1]. The problem for 2 elements is that a fast thread on iteration 3 could clear the value in diff[0] before the we can check the diff[0] value from iteration 0. This is not a problem for 3 elements because the barrier will only let the fast thread get one step ahead. Thus, instead of overwriting the diff[0] value, it will write into diff[2] and we won't have any problems!
cube
@jcarchi He's creating 3 different diff values that we cycle through updating (that's why it's an array now).
The reason that this works is because the single global diff variable was really the only cause for us to have 3 separate barriers. So, if we set it up so that for a given iteration, the previous iteration and the next iteration use different diff variables (which one we update is given by index), then we don't have to keep 3 barriers around, because there's no conflicts when updating/checking/resetting the separate diff variables. See @Faust's explanation for why this wouldn't work with only 2 diff variables.
Can someone explain this slide to me? I was in the bathroom when this was explained.
Could someone explain why diff works with 3 elements and not 2 in this example? I think Kayvon mentioned it in class, but I didn't get it.
Also, would having 1 barrier have significant speedup than using 3 barriers (since each thread can still only work ahead 1 iteration at a time)?
Kayvon mentioned that it may be possible with two, but not for this example, because with 2 elements it is possible that on iteration 0 we would update diff[0] and check diff[0] while fast threads are updating diff[1]. The problem for 2 elements is that a fast thread on iteration 3 could clear the value in diff[0] before the we can check the diff[0] value from iteration 0. This is not a problem for 3 elements because the barrier will only let the fast thread get one step ahead. Thus, instead of overwriting the diff[0] value, it will write into diff[2] and we won't have any problems!
@jcarchi He's creating 3 different
diff
values that we cycle through updating (that's why it's an array now).The reason that this works is because the single global
diff
variable was really the only cause for us to have 3 separate barriers. So, if we set it up so that for a given iteration, the previous iteration and the next iteration use differentdiff
variables (which one we update is given byindex
), then we don't have to keep 3 barriers around, because there's no conflicts when updating/checking/resetting the separatediff
variables. See @Faust's explanation for why this wouldn't work with only 2diff
variables.