Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Previous | Next --- Slide 20 of 60

lilli

Tasks are different from using 'foreach' in that with tasks you can declare multiple blocks of computation on different data to be independent of each other and ISPC can run them on different cores. With 'foreach', we are only concerned with performing the same block of computation on different data, in parallel.

unparalleled

Also like the execution of SIMD instances, execution of instructions across multiple cores is independent of each other. It does not matter which core executes first. SIMD entails a high instruction coherence for performance benefits where as no such requirement is necessary in task level parallelism.

srb

Here is more information about ISPC's tasks: http://ispc.github.io/ispc.html#task-parallel-execution

I found especially interesting trying to determine how many tasks to launch, since there are so many factors. From the User's Guide linked above: "In general, one should launch many more tasks than there are processors in the system to ensure good load-balancing, but not so many that the overhead of scheduling and running tasks dominates the computation."

apr

Relating to this, a question I have from the Assignment 1 is when there are more tasks than cores, do these tasks always run concurrently with (software or hardware context switching)? Or, does ISPC internally make syscalls which allows it to schedule only 1 task on each core, and then load another task onto it after the core has finished executing the previous task. //* Since we don't care about effective throughput in the middle of the execution wouldn't the second idea where we dispatch tasks one by one be better?

MichaelJordan

ISPC tasks are a program abstraction that allows users to send different parts of the workload across multiple cores. This is an important concept because in machines like the ones we are using in our labs, we want to be able to use both the benefit of each core's SIMD execution (with the gang abstraction) AND simultaneous execution with multiple cores.

A question for others: We've seen reasons SIMD speedup isn't maximized with some worst-case inputs. What might be an example of when multi-core speedup is negatively affected?