I got lost at this slide. Why neither of the two loops are parallelized by ISPC?
The outer for-loop is run in parallel, right?
This comment was marked helpful 0 times.
kayvonf
@yetianx: A solid question. Anyone?
This comment was marked helpful 0 times.
arjunh
When we call the sinx function, a gang of n program instances is created (where n is equal to the SIMD width of the processor). Each program instance executes the sinx function on disjoint sets of array elements. So, the program instances run in parallel on the processor.
However, the execution of a single program instance is entirely sequential. The program instance obtains its data set to work with via the outerfor loop, via a set of indices into the input array. The innerfor loop is simply used to compute the sin of a specific array element.
So, if the SIMD width of the processor was 4 and we had a set of 12 array elements, then the work would be divided among the program instances in the following manner (in terms of indices of array elements):
Program Instance 0: 0, 4, 8
Program Instance 1: 1, 5, 9
Program Instance 2: 2, 6, 10
Program Instance 3: 3, 7, 11
Note that using the foreach loop instead of a for loop would not change this fact. The ISPC compiler may choose a different way of assigning work to the program instances; the foreach construct just serves as a hint to the ISPC compiler that it is safe to launch multiple program instances to execute operations on the data set. By doing so, the programmer is no longer responsible for assigning work to the program instances.
Edit: foreach just points out that the iterations of the loop can be executed using any instance in the gang in any order, as clarified by @kavyonf below. So, the ISPC compiler could potentially choose a completely different program instance assignment than the one shown in the program on the left, such as this:
Program Instance 0: 0, 1, 2,
Program Instance 1: 3, 4, 5
Program Instance 2: 6, 7, 8
Program Instance 3: 9, 10, 11
Such an assignment would not be optimal from a spatial locality perspective (see here, but it is a perfectly legal assignment policy for the ISPC compiler to adopt, as each program instance works on a different set of data elements.
This comment was marked helpful 6 times.
kayvonf
@arjunh: Perfect (and super clear). Except... the statement "the foreach construct just serves as a hint to the ISPC compiler that it is safe to launch multiple program instances to execute operations on the data set".
No launching is going on at the start of the foreach loop. The instances have already been launched, as you state, at point of call to sinx.
The foreach construct declares that it is acceptable to execute iterations of the loop using any instance in the gang and in any order.
I got lost at this slide. Why neither of the two loops are parallelized by ISPC?
The outer for-loop is run in parallel, right?
This comment was marked helpful 0 times.
@yetianx: A solid question. Anyone?
This comment was marked helpful 0 times.
When we call the
sinx
function, a gang of n program instances is created (where n is equal to the SIMD width of the processor). Each program instance executes thesinx
function on disjoint sets of array elements. So, the program instances run in parallel on the processor.However, the execution of a single program instance is entirely sequential. The program instance obtains its data set to work with via the outer
for
loop, via a set of indices into the input array. The innerfor
loop is simply used to compute thesin
of a specific array element.So, if the SIMD width of the processor was 4 and we had a set of 12 array elements, then the work would be divided among the program instances in the following manner (in terms of indices of array elements):
Note that using the
foreach
loop instead of afor
loop would not change this fact. The ISPC compiler may choose a different way of assigning work to the program instances; theforeach
construct just serves as a hint to the ISPC compiler that it is safe to launch multiple program instances to execute operations on the data set. By doing so, the programmer is no longer responsible for assigning work to the program instances.Edit:
foreach
just points out that the iterations of the loop can be executed using any instance in the gang in any order, as clarified by @kavyonf below. So, the ISPC compiler could potentially choose a completely different program instance assignment than the one shown in the program on the left, such as this:Such an assignment would not be optimal from a spatial locality perspective (see here, but it is a perfectly legal assignment policy for the ISPC compiler to adopt, as each program instance works on a different set of data elements.
This comment was marked helpful 6 times.
@arjunh: Perfect (and super clear). Except... the statement "the
foreach
construct just serves as a hint to the ISPC compiler that it is safe to launch multiple program instances to execute operations on the data set".No launching is going on at the start of the
foreach
loop. The instances have already been launched, as you state, at point of call tosinx
.The
foreach
construct declares that it is acceptable to execute iterations of the loop using any instance in the gang and in any order.This comment was marked helpful 4 times.