Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

jocelynh

When a gang of program instances is spawned, each instance of the function gets access to two variables:

programCount: gives the number of instances that have been spawned for this function.
programIndex: gives the id of your specific instance.

In the example sin(x) program, these variables are used in the loop to make each instance calculate sin(x[i]) for every "programCount"th (i.e. fourth) value requested, starting from the "programIndex"th index.

hweetvpu

From the ISPC documentation (which is really nice): the set of program instances is mapped to the SIMD lanes of the current processor, leading to excellent utilization of hardware SIMD units and high performance.

The programCount is usually no more than 2-4x the SIMD width, so that the instructions will be converted by ISPC to fit the SSE instruction set width (e.g. 8 instances and 4-wide SSE instruction set).

-o4

The performance benefit of always using uniform when expressing such semantics can be found here.

muchanon

What is the effect of using the uniform modifier unnecessarily? For example, if the float value were specified as uniform, would it cause any issues?

jkorn

uniform indicates a variable that is exactly the same across all program instances. In your case, if we had uniform value instead of float value, I'm not sure if it would be a compile-time error or if it would still run and just be incorrect (perhaps share values across program instances that shouldn't be shared), but the value of value is dependent on what programIndex is, which is a program instance-specific value. So for different program instances, value would take on different values, which by definition contradicts what uniform represents.

ZoSo

I think using a uniform for the float value will generate a compile time error in this case. This is from the intel documentation on this - "It's legal to add two uniform variables together and assign the result to a uniform variable, but assigning a non-uniform (i.e., varying) value to a uniform variable is a compile-time error."

shhhh

Could someone explain to me how this piece of code works? I'm confused on certain parts of it. i is a uniform value, so it's the same over all the threads, no? When i is incremented by programCount this new value should be shared across all the threads, so how do all the threads do different work? Also, what is the programIndex?

shhhh

Nevermind. I had the purpose of programIndex and programCount switched in my head so the code didn't make sense. I got it now.

mak

If I remember correctly, it was mentioned in the class that programCount (i.e. no. of instances in the gang) is specified by the user at compile time. Wouldn't it be better, if it was controlled by ISPC itself, as it has all the necessary info about the HW architecture to make most efficient assignment decison?