Can local queue access be made significantly faster than stealing from another thread's queue? It seems like a thread must still acquire and release a lock every time it accesses its own queue, since some other thread might steal from the queue at any time.
Probably the lock-free stealing implementation that the cilk does as introduced in the later half of the lecture?
Is it possible that there's a global monitor that watches the queue, and it has the availability to adjust the workload in the queue? Then it can observe possible slow queue consumption and move a task from one queue to another
Can someone explain why a distributed work queue leads to increased locality?
@kipper I think it's because the sub-tasks from a thread will be put in the queue for this thread again. In cases where sub-tasks are not stolen by other threads, same context will be used for computation.
As explained in later slides, it's better to steal more work than less because stealing is expensive in terms of synchronization and communication. The more work we steal, the smaller the chance that we'll need to steal again soon
@kipper i think it is because the tasks queue lives in the thread memory space. Compared to shared task queue, where the data has to be fetched outside the thread