Previous | Next --- Slide 18 of 57
Back to Lecture Thumbnails

Am I right in saying that the program on the right works faster than a sequential loop to add the elements of the array since it computes partial sums of chunks of the array potentially in parallel?


@captainFlint Yes!


@captainFlint: Yes. The program on the right can divide the array into "programCount" chunks (not necessarily contiguous chunks) , which can be executed in parallel using SIMD instructions on a single core. We can also achieve parallelism across multiple cores with tasks in ISPC, but that wasn't shown in this example.


Is there any way for a programming language to automatically detect race conditions? On the flipside, is there any way race conditions can be used usefully?


Reductions function:

You can also compute a variety of reductions across the program instances. For example, the values of the given value in each of the active program instances are added together by the reduce_add() function.


I will try to explain the reason why the program on the left-hand side gives compilation error.

Inside "foreach", the variable i is not uniform since it is different across instances. Hence, the value of x[i] is also instance-specific. Thus, adding them to a uniform variable sum doesn't work (I'm guessing there might be race conditions, etc.). RHS gets rid of the issue by maintaining an instance-wide partial sum, and eventually runs a sum-reduce to calculate the total sum.

Is my understanding correct?


@kidz You might like to look into the Rust programming language, which specifically catches race conditions in its type system statically.

About the uniform keyword, is it actually considered a part of the type? For example, we do get a type error on the left because, while the array itself may be uniform the elements are not. This is reflected in its type: uniform float*. Would it be possible (in theory or in practice) to create a type (uniform float)*?


Here it's just preventing race condition early in compile time, and never make it happen in runtime. This kind of race condition has a very obvious characteristics: left value is uniform and right value is varying. If you must say something about detection, it's like detection through static analysis.


On the right, is declaring uniform float sum necessary in the beginning? No program instance is using and shouldn't float sum = reduce_add(partial) suffice?


@xingdaz The uniform qualifier means that it is shared among all the program instances within the gang, so yes, it is necessary. The program on the right creates a new partial variable for each program instance (therefore, no uniform) to avoid race conditions, but the entire gang is still computing the sum, so there needs to be only one sum (therefore, uniform).


Yes, the program on the right runs faster because it takes advantage of parallelizing chunks of the array and simultaneously processes each chunk.


@kidz, Go is another example of a language with a built-in race detector. See Go Race Detector for more info. If you build your program with the -race flag, the compiler inserts extra code to log memory accesses, and then when you run the program the runtime system will report any race conditions that occur. As you might imagine, this creates a fairly large overhead, but it's good for debugging concurrent programs.



adding to @PandaX, you can look more at various reduction functions (one of them is reduce_add) in here reduction functions