Slide View : 15-418/618 Spring 2014

Previous | Next --- Slide 18 of 52

kayvonf

Question: We talked about the ISPC gang abstraction in class, but we did not talk about the semantics of the ISPC "task" abstraction in class. Someone explain the semantics of an ISPC task. Then describe one possible implementation of tasks. Better yet, take a look at the code of how tasks are implemented in ISPC and tell actually how it's really implemented... it's right there for you in the assignment 1 starter code in common/tasksys.cpp.

This comment was marked helpful 0 times.

vrkrishn

Whereas ISPC "gangs" is a SPMD abstraction, I feel that the ISPC tasks are more of a Multiple Instruction Multiple Data (MIMD) abstraction. From Wiki, MIMD is:

"Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data."

Although there is no description about how the different tasks are run, the lack of a defined order of execution among tasks and the inability to predict when each task is running each instruction in the functions seem to hint that this is the ideal abstraction.

Now onto the implementation. When looking at the tasksys.cpp file, I notices a lot of the contents of the files are determined by the specific system or user input. From the Github project for ISPC:

"Under Windows, a simple task system built on Microsoft's Concurrency Runtime is used (see tasks_concrt.cpp). On OSX, a task system based on Grand Central Dispatch is used (tasks_gcd.cpp), and on Linux, a pthreads-based task system is used (tasks_pthreads.cpp). When using tasks with ispc, no task system is mandated; the user is free to plug in any task system they want, for ease of interoperating with existing task systems."

The underlying system interactions are completely interoperable! Each of the underlying task structures such as TaskGroups, TaskGroupBase, TaskSys all have varying definitions based on the systems that the user uses with the ISPC compiler.

This comment was marked helpful 1 times.

tcz

It was stressed that it's important to separate abstraction from implementation. But based on this slide ("a 'task' that is used to achieve multi-core execution"), it seems to me that the gang abstraction has some level of inherent implementation details, considering that you need another abstraction for multi-core execution. As far as I can tell, only the implementation is preventing the single program from running on the other cores, not the abstraction itself.

EDIT: Maybe I misunderstood the slide. It says the task is used to achieve multi-core execution, but after thinking more about @vrkrishn's comment, I think a more accurate phrase would be that the task is used to perform those otter parallel tasks that cannot be run on a single core. Specifically, MIMD tasks.

This comment was marked helpful 0 times.

kayvonf

@tcz: you make some really good points. To dig deeper, read the creator of ISPC's comments about the level of abstraction achieved on slide 7.

ps. I'm still waiting for someone to tell us how ISPC tasks are implemented!

This comment was marked helpful 0 times.

yanzhan2

I think the ISPC tasks are created by its own arguments and mapped to pthread to execute. There are limited number of pthreads, such as 4, but there could be many tasks, each like a instances and form a task list. Each pthread would run tasks from the list until the list becomes empty. Tasks managements are written in software, while pthread control is in OS.

This comment was marked helpful 0 times.

mmp

@tcz: you're right that it's the implementation that is intentionally not doing multi-core stuff under the covers.

A different design might have said, "given a foreach loop, automatically decompose the loop domain into disjoint regions and run a task to process each region, in turn using SIMD on each core to help with the region loops". This would certainly be handy in some cases, and less work for users who just wanted to fill the processor with computation as easily as possible.

The reason for this design, for what it's worth, is a strong desire for performance transparency. Choosing to run threads and decompose into tasks introduces meaningful overhead in some cases, and in some cases it isn't what the programmer wanted. (Maybe the overhead was enough that there wasn't a speedup, etc.) In general, I wanted to be sure that an experienced programmer reading ispc code could reason effectively about the assembly output the compiler would generate.

Especially for a tool that's intended for good programmers who want to write fast code, I think it's important that the compiler not surprise the programmer in its output. These surprises are inevitable if the compiler is, for example, applying heuristics to decide if a loop is worth turning into tasks--you might add one more term to an addition expression and cross the point from 'no tasks' to 'tasks' and see an unexpected slowdown in your program because the compiler got the heuristic wrong!

Much better, IMHO, for the programmer to be under full control of thissort of thing.

Of course, other languages can be designed with different goals (or more user friendliness :-) ), and their designers may make different decisions.

This comment was marked helpful 0 times.