Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Fine-Grained Synchronization and Lock-Free Programming

Previous | Next --- Slide 17 of 31

kayvonf

Question: this implementation is a little non obvious. First, can someone explain how it works. And second, how does it (like the last slide) avoid the need for atomic operations?

vrazdan

So whenever you take an element of the queue, rather than remove the node and it's space in memory, we merely move the head forward by one, and return the former head. This is fine, as the next time we pop we will no longer have a reference to the previous element, even though it's still there in memory.

Push is where not only new elements are added at the tail (which is unchanged from before), we use this q->reclaim. Looking at the while loop guard, we see that we would stop when q->reclaim is pointing to the head of the list (and is initialized as such). Reclaim keeps track of the address of the original head of the list whenever an element is added. I say the original head of the list, as when we pop we lost the reference to that element as we merely pushed head forward. So now whenever we push a new element, we start at where the old head was, and continue to delete those nodes until we are back at the current head of the list.

There is no need for atomic operations in the while loop in push, as the list has no knowledge or other reference to the nodes that were removed other than reclaim, so no one else would be changing it. When doing the while loop guard check, if reclaim gets compared against a head that is just about to change in a pop operation, it's fine as then removing that node will just happen in the next call to push.

HLAHat

So, I guess this works because the code guarantees that nodes are only deleted when we're sure they are no longer connected to anything useful. The nodes between head and reclaim are all guaranteed to have no references to any critical nodes (that is, nodes actually in the queue). In here, the head node pulls double duty as a "fence" around the actual queue and a marker for the start of the queue. Reclaim just marks off where the "dead" nodes begin.

BigFish

There is no need for atomic operations since only the producer (single thread) is modifying the queue. Each time pop() is called, it simply moves the head to the next node and return the value directly without modifying the queue. When the push() function is called, it put the new node to the tail of the queue. However, in the same time, it free all nodes between q->reclaim and q->head, which are all nodes popped out between two consecutive calls of push().

russt17

Does anyone know why the deletion occurs in the push and not the pop? Is there some advantage to this?

cgjdemo

@russt17 I think one reason is that each time we do a pop, the head pointer moves to the next node. This results in head pointer not equaling reclaim pointer. So if we put reclaim code in pop, we will trigger the reclaim every time we call pop.

subliminal

Can someone explain why this implementation is needed, and the one on the last slide, minus the wraparound logic, won't work?

Kapteyn

@subliminal The previous slide uses an array to store the queue. That is why it is bounded because the array is not dynamically resizable. If you want an unbounded queue and you want to keep using arrays instead of a linked list (as we do in this example) you would need to be able to resize your array each time the index of the tail element in the queue exceeds the current array size. You also need to keep track of what the head and tail indices are in the array.

Furthermore, how are you going to make sure you don't run out of memory to hold the growing array or that you're not taking up more space than necessary? As you pop, the head index of the array moves farther away from index 0, leaving a bunch of wasted space at the beginning of the array.

You could occasionally shift all elements in the array back so that they begin at index 0 and then shrink the array size to fit the number of elements in the queue but this is expensive and annoying. So we might be better off using a linked list and the reclaim method described here.

subliminal

@Kapteyn: Thanks! I think I framed my question a little badly though - what I meant was, why do we need the reclamation technique? Why can't pop do the deletes itself? Is this just an amortization of the delete operation, or is it needed for correctness?

funkysenior15

I think @subliminal is right. Another thing to notice is that pop() always writes to q->head and push() always writes to q->tail, and writes to these two addresses will never overlap because of the "if" check in pop().

It doesn't seem like allowing pop() to do the work of reclaiming will do this program any harm.