The stalling join model requires all thread to read and write to the descriptor for block A, which can require a large number of accesses to non-local memory (especially if, for instance, the threads are run on separate chips or even machines). Also, if the spawned task is very large, the cost of locking and unlocking to access the descriptor could be large.
machine6
Correct me if I'm wrong, but doesn't the greedy model also make all of its threads access the descriptor?
The stalling join model requires all thread to read and write to the descriptor for block A, which can require a large number of accesses to non-local memory (especially if, for instance, the threads are run on separate chips or even machines). Also, if the spawned task is very large, the cost of locking and unlocking to access the descriptor could be large.
Correct me if I'm wrong, but doesn't the greedy model also make all of its threads access the descriptor?