Data parallel expression should be supported by compiler. I think it might be a good point to automatically generate data parallel without manually write such code. However it might be very difficult.
I could see a compiler spawning a new thread for each body of the forall-loop and waiting for them all to finish their respective loop bodies before continuing the code following the forall-loop.
@nate Creating so many threads for a small amount of work may slow the program down due to the overhead of spawning a thread.
In this example, the programmer points out the parallelism in the program, but leaves it up to the processor to distribute the work at runtime, which is declarative programming (only tells WHAT to do). While in the previous example the programmer does it imperatively (tells the processors HOW to execute) by explicitly partitioning the work and assigning it to the threads.
It's amusing to take a look back at these slides and realize that this magical fictitious language can exist in the form of OpenMP -- in the red section, just add a single #pragma.