Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Perf Optimization I: Work Distribution and Scheduling

Previous | Next --- Slide 10 of 64

c0d3r

If we knew additional information about test_primality, we could potentially distribute tasks in a smarter way.

For example, if we knew that larger ints took test_primality more time (in general), we could sort x in decreasing order before the loop. This is like implementing a sorted greedy job scheduler.

althalus

@c0d3r, wouldn't that make it semi-static again since the program is unable to recognize at runtime that larger ints take longer? And also, someone mentioned that we could randomize the array and statically partition it to have a more even cost for each core.

msfernan

Professor Kayvon mentioned in class that ISPC was an example of dynamic task assignment.

Therefore in assignment 1 in mandelbrot ispc with tasks, the work (in this case the computation of a pixel in the Mandelbrot set) was assigned to tasks that were executed on all the cores of the processor.

In the first part of assignment 1 in mandelbrot-thread, the work (in this case the row-wise computation of the pixels in the mandelbrot set) was assigned to pthreads statically (before runtime by you the programmer). Each pthread was executed on a different core.

yangwu

since all tasks in CUDA go through scheduler first, my feel is all tasks are scheduled 'dynamically', the real static part comes from how you decompose the job

Richard

In the example in this slide, the strategy for dynamic scheduling is actually using a work queue (but not distributed queues in the following slides).

I'm wondering: as a programmer, how I can choose the strategy. Is it via using programming tricks like above? Or are there interfaces where I can directly tell operating system which strategy I want?

xx420y0los4wGxx

@Richard I'm not sure if you're asking about the decision to use dynamic vs static, or the implementation, but we can see in OpenMP an example of the latter.

In terms of decision, the way I think of it is this. Within dynamic each task comes from some kind of structure and accessing that structure has some inherent overhead. The decision depends on if this overhead outweighs the cost of a couple threads having unbalanced assignment.