Previous | Next --- Slide 2 of 56
Back to Lecture Thumbnails
yetianx

I got lost at this slide. Why neither of the two loops are parallelized by ISPC?

The outer for-loop is run in parallel, right?

kayvonf

@yetianx: A solid question. Anyone?

arjunh

When we call the sinx function, a gang of n program instances is created (where n is equal to the SIMD width of the processor). Each program instance executes the sinx function on disjoint sets of array elements. So, the program instances run in parallel on the processor.

However, the execution of a single program instance is entirely sequential. The program instance obtains its data set to work with via the outer for loop, via a set of indices into the input array. The inner for loop is simply used to compute the sin of a specific array element.

So, if the SIMD width of the processor was 4 and we had a set of 12 array elements, then the work would be divided among the program instances in the following manner (in terms of indices of array elements):

  • Program Instance 0: 0, 4, 8
  • Program Instance 1: 1, 5, 9
  • Program Instance 2: 2, 6, 10
  • Program Instance 3: 3, 7, 11

Note that using the foreach loop instead of a for loop would not change this fact. The ISPC compiler may choose a different way of assigning work to the program instances; the foreach construct just serves as a hint to the ISPC compiler that it is safe to launch multiple program instances to execute operations on the data set. By doing so, the programmer is no longer responsible for assigning work to the program instances.

Edit: foreach just points out that the iterations of the loop can be executed using any instance in the gang in any order, as clarified by @kavyonf below. So, the ISPC compiler could potentially choose a completely different program instance assignment than the one shown in the program on the left, such as this:

  • Program Instance 0: 0, 1, 2,
  • Program Instance 1: 3, 4, 5
  • Program Instance 2: 6, 7, 8
  • Program Instance 3: 9, 10, 11

Such an assignment would not be optimal from a spatial locality perspective (see here, but it is a perfectly legal assignment policy for the ISPC compiler to adopt, as each program instance works on a different set of data elements.

kayvonf

@arjunh: Perfect (and super clear). Except... the statement "the foreach construct just serves as a hint to the ISPC compiler that it is safe to launch multiple program instances to execute operations on the data set".

No launching is going on at the start of the foreach loop. The instances have already been launched, as you state, at point of call to sinx.

The foreach construct declares that it is acceptable to execute iterations of the loop using any instance in the gang and in any order.