Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Perf Optimization I: Work Distribution and Scheduling

Previous | Next --- Slide 56 of 64

zale

The stalling join model requires all thread to read and write to the descriptor for block A, which can require a large number of accesses to non-local memory (especially if, for instance, the threads are run on separate chips or even machines). Also, if the spawned task is very large, the cost of locking and unlocking to access the descriptor could be large.

machine6

Correct me if I'm wrong, but doesn't the greedy model also make all of its threads access the descriptor?