Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

williamx

If we know the workload of all the tasks, would it be efficient to sort the tasks from longest to shortest, and then schedule the tasks in that order?

aabhagwa

Is there value in migrating tasks in the middle of execution? If this can be done with low overhead, we might be able to get a good workload balance even if the task size distribution is unexpected.

CSGuy

This slide mentions some knowledge of the workload is required, what if there are tasks that can perform faster if they are on the same core say if they use shared memory for example)? Or tasks that could perform slower on the same core, say they each use about a core's full cache and could avoid cache misses if they were on different cores?

ctabrizi

@CSGuy, those are good questions. Would be interested in seeing some examples of that

googlebleh

@CSGuy Wouldn't that require even more specialized knowledge of the program? In addition to knowing the expected runtime of the program ahead of time, you'd have to know what memory accesses the program is going to make (and when! If the program is done relying on the cache, then we can schedule in another program that wants to rely on it). This sounds like maximizing the utilization of the cache.

In my opinion, that optimization is more suited to a scheduling decision made by the programmer (i.e. static scheduling assignment). In order to let this be dynamically assigned, the programmer/compiler would have to store this extra info, and the CPU would incur the extra overhead of making the scheduling decision at runtime (which is more complicated than simply comparing expected runtimes).

Don't get me wrong though, I do agree it's a good idea to speed up execution by trying to improve cache utilization.

koala

If we know the workload of all the tasks and schedule from the largest to the smallest tasks, we can get 4/3 the optimal scheduling with not very much overhead. However, knowing fully the workload is probably difficult for many parallel algorithms.

blah329

@aabhagwa Migrating tasks in the middle of its execution may have a negative impact on performance because the task may have some of its data cached within the caches that are private to the processors. As a result of this migration, they may need to flush their newly written values down the memory hierarchy, as well as incur the penalty of cache misses on the new processors that they execute on.

sandeep6189

@blah329 Good points. However, there will be scenarios when we might wish to move to another machine which has more resources(cycles, memory) for the current job running or when we are able to predict an impending workload imbalance. Incurring one-time performance(assuming we don't have lot of migrations) hit because of cache invalidation/page fault is better in this cases.

kapalani

To add to the above discussion, we might also want to migrate threads to a different core for power reasons if it would mean one core is completely idle and so can be shut down

o_o

This kind of feels like the greedy algorithm from 15-251 with scheduling. We would want to schedule the larger tasks first, and then go down the list of largest to smallest, as long as they aren't dependent, and hope that we can get close to optimal time management.

nishadg

How does the scheduler estimate what "long tasks" are? Is there a simple heuristic for doing this (with little added overhead)?