The parallelism occurs when the function is called rather than within the logic of the function itself. So even though everything runs in parallel, none of the loops are actually parallelized.
To elaborate on what @efficiens said, the parallelism occurs when a gang of program instances are created, each responsible for executing sinx() with a unique programIndex.
I am not quite clear. If I use "foreach" keyword to explicitly state the independency of the loops, does it mean the function itself has parallelism? And thus the above answer will become Yes?
@Fantasy, I think for the above example, the program only specifies for each program instance, what is the set of instructions needs to be run. Therefore none of the iterations is parallelized by ISPC. And when we use foreach, it is ISPC(compiler)'s job to assign the works to the program instances, that is why we can say it is parallelized by ISPC.
So is it true that if foreach is used, then we would say that the outer loop has parallelism, but in this case, we just say that the function itself is parallelized? Because in either case program instances are created when the function is called, so the function itself is running in parallel, right? Do we only say the loop itself is parallelized if we leave it to the compiler to decide which program instances will compute which iterations (using foreach)?
I agree with @jerryzh, the outer loop itself only contains 1/ProgramCount of total work. It is a decomposition result from parallelization, but itself doesn't contain any parallelism, since all work in the outer loop is only done by one program instance.
@huehue, I think you are right, either case the program may end up running in parallel. Please pay attention to the question: "Which iteration of the loop(s) are parallelized by ISPC".
I don't know if this pertains to this slide, but I think I'm still a little confused by abstraction versus implementation. Does that play a role here? Is that the idea of a programmer versus the compiler determining what to run in parallel independent chunks?
@makingthingsfast, I believe so. This function represents an implementation, where the programmer has specified exactly which parts will run in parallel. So the loop isn't "parallelized by ISPC", the programmer has done it.
If they had used the foreach abstraction instead, you could say that ISPC had done the parallelization.
So the function itself is not parallelized by ISPC. But if we change the question to "which iterations of the loops are parallelized by the programmer?", would the answer be the outer loop?
@CC, I think that's correct. The programmer parallelizes the outer loop by specifying a mapping of the computations to each gang member by their programIndex.
The loop is not parallelized. It is merely used to map the computation to each program instances. Parallelism starts when this ISPC function is called.
I believe whether we use for, or foreach, none of the loops are parallelized. Repeating what @PandaX said, even if we use foreach keyword, ispc will only decompose it. The actual parallization happens at the caller site. cmiiiw
Nice discussion here!
The syntax foreach is just a way for the programmer to tell the compiler to do the job of assigning work to program instances instead of the programmer manually doing it him or herself.
When the compiler sees foreach(i = 1... n), it maps all of the iterations of the for loop to specific program instances within a gang.
So in both cases whether or not we use for or foreach, the execution of the loop iterations mapped to a given program instance are not being parallelized. The parallelization happened at the time when the work was assigned to each program instance and the function was called.
I can see how this would be confusing because when you write foreach as the programmer, you're probably visualizing all program instances in a gang being assigned iterations of the for loop and executing them in parallel. But that actually all happens before the function is called by some code produced by the compiler which it knew to do because you wrote "foreach ...".
When the slide says that none of the iterations of the loop are being parallelized by ISPC it means that there is no parallelization within the set of for loop iterations mapped to a given program instance. Yes, there is parallelism across program instances, but all the code within this function is being run by a single program instance sequentially.
Think of the function as a sequence of instructions that are run sequentially by each program instance for each of the iterations of the for loop that got mapped to it by the compiler (or the programmer in the case of a for loop instead of a foreach).
Just reiterating my inference from this slide:
Actually nothing is parallel until this function is called.
Outermost loop is parallel. The parallelism happens at the time it is called. Execute it by an entire gang size all at once.