Previous | Next --- Slide 9 of 47
Back to Lecture Thumbnails
xiaowend

The difference between this code and the previous ones is in this code, each "foreach" is independent. We don't really care how to distribute them in multiple program instances. And they can be computed in any order.

akashr

You mentioned in the lecture that the order in which the calculations are done in the array for this code can be done in any order. Question: Is the compiler smart enough to realize that we can look through the array x here in order and do the calculations in order so that we get the least data loads from memory? Or do we have to manually do it like we did 2 slides ago?

joe

In response to akashr's question, I actually feel that the interweaving process is not only the smartest choice, but the easiest to implement by the compiler. So it is a safe bet that they are done in order. Number one rule though: No Guarantees!

mmp

This is a good example of the value of the difference between abstraction and implementation.

The ISPC language/execution model spec does allow the compiler to try to choose an ordering that made memory access as coherent or regular as possible; that could be an interesting thing to try.

The current implementation doesn't do that, though; it currently ends up doing a straightforward mapping of the iteration range to program instances. I think this is actually a reasonable approach for two reasons.

First, I would guess that such an optimization would be nice for simple cases, but wouldn't be less frequently useful for more complex programs. For the simple ones, it's easy enough to rewrite the loop to be more coherent anyway.

Second, one problem with this kind of "smart" compiler optimization is that, paradoxically, it can be a pain for the programmer in that it makes it hard to have a clear mental model of what the compiler will do. e.g. if you're a good programmer and you expect that optimization to kick in in some case but then it doesn't (due to some unknown implementation limitation in the compiler), then what do you do? What if it kicks in when you don't want it to and that you really did mean to write the loop that way, thank you very much? In general, this sort of optimization makes it harder to predict what the compiler output will be, which is undesirable for performance-oriented programmers. (That said, if the programmer isn't performance oriented, this sort of thing would make more sense, since one might assume they won't do anything themselves on this front, so better to at least sometimes see a performance benefit from doing it for them.)

Now, given that last point, one might argue that it would be better if the ISPC spec defined a particular execution order for "foreach" loops, so that programmers can reason about the memory access patterns that will result when they use this construct! (At least the current implementation provides that predictability.)

raphaelk

Question: It seems like using foreach gives the compiler more flexibility than having the programmer write code using programIndex to manually assign work to specific instances. But looking at the discussion, it seems to be simply for convenience for us coders... Then, is there any benefits using programIndex? Will having control over "how tasks are assigned" using programIndex give us some chance to optimize/speedup our program? It seems like there might be a "smart" ways create gangs of even amount of work but that might come with good amount of overheads...