Previous | Next --- Slide 2 of 40
Back to Lecture Thumbnails
happyfeet

Answer: The foreach loop is parallelized by ISPC because it is a construct that a programmer can use to indicate to ISPC that each iteration of the foreach loop is independent and can be executed in parallel. Thus, ISPC takes advantage of this and parallelizes it accordingly. The inner for loop, however, is not parallelized since ISPC always treats regular for loops as sequential constructs.

nkindberg

I don't think that description is completely accurate.

As Kayvon put it in lecture, there is no parallelism expressed in the code in this or the previous slide. ISPC does not parallelize loops. It creates program instances, each of which run sinx. So what you actually have is multiple instances of the function being run in parallel. The ISPC compiler assigns iterations of the foreach loop to different program instances, but the loop itself is not parallel. foreach is really no different than making the assignment yourself like in the previous slide, only you're letting the compiler make that assignment instead. The manual assignment in the previous slide is one example assignment ISPC could make for the foreach loop here.

martin

An easier way to think of this is: Once a call to ISPC function is made, a gang of program instances would start to run the ISPC function in parallel, and return the control back to the caller when all the instances are finished. It doesn't matter whether we use foreach or program counter/ program index. Those are basically syntax for ISPC, which give programmers flexibility to assign the work manually or let the compiler to do the job.

jedavis

I am failing to grok the semantics of "parallelize" and what is meant by the presence or absence of parallelism in the code here, I think. I understand what the SIMD instructions that the compiler generates will do and how the compiler will get there, but how exactly are we defining "parallelize" for the purpose of this discussion?

nkindberg

By parallelize, we mean introduce parallelism (i.e. create instances/threads/tasks/etc). The foreach loop doesn't mean 'do this in parallel'. It simply means at compile time divide up this work for program instances that will be executing this function. The spawning of the program instances is where parallelization happens since that's where things are spawned that run in parallel. That occurs outside of the function. When each spawned instance of sinx executes the function body, it will know what indices in the range 0..N it should work on based on the division the compiler did (think of it as creating a for loop with programCount/programIndex like in the previous slide), and each instance will execute the code inside the function completely sequentially.

I think it's important to keep in mind we are working on a SPMD abstraction level here. The abstraction doesn't say how those instances of sinx should be run in parallel, just that there will be instances of it running in parallel. SIMD instructions are one possible way to implement the running of the instances in parallel.

jpaulson

The foreach loop says "these loop iterations are independent work". ISPC uses that information to decide on a good way to parallelize the loop.

The actual code is being executed "in parallel", in the sense that 8 iterations are being done (by 8 different ALUs) at the same time.

kuity

This is a little late, but I am feeling slightly confused by this discussion.

Isn't dividing up the work among the "gang of program instances" something like creating threads to run the function in parallel? So why is it there is no parallelism expressed in this code?

pebbled

@kuity You're right, the program instances of the gang are each running the function in parallel, much like multiple forked threads running it in parallel. Kayvon says there is no parallelism expressed in this code, because there is nothing explicitly parallel about it. Taken in a non-SIMD environment, this is a purely sequential program, iterating from 1.. N. It's all up to the ISPC compiler to give meaning to "program count" and utilize the parallel SIMD lanes, whereas pthreads code would be explicitly parallel- you, the programmer would be instructing which threads to run which functions.