Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

jrgallag

With the hand-over-hand locking technique we used for the linked-list, deadlock is impossible. This is because we always acquire locks in ascending order: if we have a lock, we never ask for a lock on an earlier node. Therefore there cannot be circular wait, and therefore cannot be deadlock.

0xc0ffee

Circular linked lists were mentioned in lecture and you would run into deadlock problems in that case. (This is as the ascending order invariant is no longer true).

For example, if you have a two-element circular linked list (silly example, I know, but)

X -> Y -> X

Thread 0 grabs a lock on X
Thread 0 grabs a lock on Y
Thread 0 releases the lock on X
Thread 1 grabs the lock on X
Thread 0 begins a delete on Y
Thread 1 begins a delete on X

Then neither thread can delete their respective node, as they each need the next pointer for the previous node, but cannot obtain it as the other thread has the prev node. Deadlocked :/

ppwwyyxx

There is a trade off in selecting the granularity.

We can have one lock for every several nodes instead of every one node, then there will be less overhead in traversing and storage, but less parallelism as well.

mperron

I think we can have a "middle ground" implementation and initialize a fixed array of lock of a size proportional to the number of threads. We can acquire a lock by using the address of the item we want to lock mod the length of the lock list. This allows us to decide the granularity of locks by choosing the number of total locks at we initialization of the lock array. However, I believe this could result in deadlock if we need to acquire multiple locks, so a more sophisticated scheme might be necessary to prevent this.

ppwwyyxx

Apart from the granularity, another potential way to optimize the linked list is to have non-exclusive lock (e.g. reader-writer lock) to allow parallel accesses.

aeu

Is there a general consensus on the difference between the performance of lock-free data structures (that use atomic operations) vs. fine-grained locking? We've learned about how both kinds of operations impact the performance of all the cores on a high core count machine. I wonder whether it is better to use one or the other when, for example, you're programming on ~60 cores for a Xeon Phi.

Perpendicular

@mperron There is a problem with a middle ground approach in that we need to account for modifications to the list. (Additions and deletions). The linked list may become too small or too big and we may not see the best speedup.

memebryant

@aeu, I don't believe there is a general consensus on this topic.

To the best of my knowledge, the tradeoffs come from locks being inherently slower under high contention than a short instruction sequence and an atomic commit operation. Fine grained locking doesn't have the ability to solve this problem, it just tries to reduce the amount this occurs by hopefully avoiding contention. I believe deciding which to use comes down to whether your access patterns can avoid contention, which suggests that lockfree structures should in general be more efficient at higher numbers of cores.

bdebebe

@aeu The above point touched on most of it, but I also think it's worth noting that the discussion not only deals with contention, but overhead tradeoffs as well.