Previous | Next --- Slide 16 of 57
Back to Lecture Thumbnails

Error: int idx not declared


I am a little confused about the statement "these are the iterations the instances in a gang cooperatively must perform". I am not sure what does cooperatively perform means; I thought that the loop iterations declared by foreach are supposed to be independent to each other.


@1pct The way I understand it is that there are many instances ("workers") in the gang. And, in "foreach (i = 0 .. N)", we have N "tasks" to run, and thus the instances ("workers") collectively finish the "tasks" independently in parallel. Does it make sense?


It may help to keep in mind the foreach is just a shortcut a programer can take to implement the strategy mentioned on slide 10 where each program instance does computations on noncontiguous input elements. As mentioned in class its reasonable to bet the compiler will choose this implementation as opposed to the less optimal implementation shown on slide 13.


Can someone explain what do the slides imply by a "static interleaved assignment"?


@efficiens I guess "static" means the assignment is determined at compile time, as oppose to "dynamic" which means the assignment is determined at run time. And "interleaved" is like the first version implementation we discuss in previous slides (slide 14), as oppose to "block" in slide 15.


I think when we refer to static interleaved assignment, it's denoting something like on slide 9 where we specifically told program instance i to operate on all elements of the array such that arrayIndex % programCount = i. It's static because by just looking at our code before running it, we know how work is distributed among program instances. In this slide, with the foreach abstraction, we don't know what the assignment is going to look like at run time. We're essentially saying we want this loop to be executed N times and giving ISPC the responsibility of mapping those iterations onto the available program instances. In that sense with foreach, the assignment isn't static anymore.


A followup question from the discussion above: are there any "dynamic" implementations of the foreach abstraction (different from the ISPC implementation) that determine the optimal assignments (interleaved or blocked, etc.) at compile/run time? Since, I can imagine, there certainly exists cases where the blocked assignment will out-perform the interleaved one.


Going back to some of the code handed out for Assignment 1, what is the difference between launch[N] and foreach in this example? Could we use the ISPC keyword launch here?


The launch keyword actually has quite different semantics than foreach. From a low-level perspective foreach expresses SIMD parallelism, and so it will (almost always) be mapped to vectorized instructions. Launch, on the other hand, specifies multi-core parallelism, and so it will be mapped using some thread-based implementation; it could either be by launching N pthreads, or by launching 1 pthread per processor, and then performing work stealing.

At a more abstract level, foreach is very declarative; it only specifies that this piece of data is fully paralellizable, and does not tell the compiler how to partition the data. Launch on the other hand is more imperative; the function that is launched tells the compiler how to partition the data, and the compiler must follow that exactly. Thus, foreach gives the freedom to the compiler to partition the data, while the function that launch runs tells the compiler how to partition the data.

You can the various ways that tasks can be implemented by looking at the ISPC repo. Look at his file here for more details.


@mbperez. Very nice post. The only clarification I would make is that you write "parallelism" when the semantics really dictate "potential parallelism": (independent work than can be run in parallel).