Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Faust

I found the in class comment on this slide very helpful when a student pointed out that a good way to view this is that all the tasks are in a central pile. When a worker thread is finished with its current task, it grabs the next one from the central pile of tasks. I believe this is also known as round robin assignment.

rhnil

If I understand this slide correctly, the ISPC compiler will generate a worker thread pool for task processing rather than create one pthread for each task. And the pool size depends on the number of cores available in the processor. But I'm confused whether this mechanism still leaves some uncertainty at runtime. Since the implementation uses pthreads, the OS is responsible for assigning threads to cores. How can we be sure that each worker thread will map to one physical core? If at a certain time the OS decides to assign more than one worker threads to one core and perform context switches between them, then I'm afraid the performance will be degraded and the cached will be polluted.

top

@rhnil I had the same question. Is there some communication that goes on that allows for each thread to run on separate cores or do we just have to trust that the OS will do the best it can when assigning the threads?

kayvonf

@rhnil and @top: you have to trust the OS. But modern OS's are smart enough to do a good job most of the time.

kayvonf

If you do want to give the OS hints about where to schedule a thread, operating systems typically provide some form of system call API to do so.

For example, on Linux check out: sched_setaffinity():

http://man7.org/linux/man-pages/man2/sched_setaffinity.2.html

On Windows, check out: SetThreadAffinityMask():

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx

HingOn

We know that ispc tasks running on the vector lanes in a single core are synchronized (running the same instruction with masks). But if we are dealing with multiple cores, how are the ispc tasks synchronized across the cores?

kayvonf

@HingOn: tasks running on the vector lanes? Are you sure?

HingOn

Oh right, it should be the parallelizable code in independent iterations of a ispc function...

kayvonf

@HingOn: One more time ;-) "parallelizable code in independent iterations?..." (see slide 2

HingOn

right, it should be the code in foreach loop iterations...

gryffolyon

The number of worker threads that are created by ISPC tasks -- is this something that the programmer can tweak?