Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

CC

Can anyone explain how this works? Like what atomic_irc_increment does?

kayvonf

@CC. It's an atomic circular increment.

void atomic_circ_increment(int* x) {
  (*x)++;
  if (*x == P)
    *x = 0;
}

418_touhenying

Can you elaborate more specifically how this works? I still fail to see why this is a solid implementation. Thanks!

lol

@above You can think of the array as sort of a queue. When a thread locks, it gets the next available position in the array, and it waits while the status of that position is 1. Hence when the status of the position is 0, the lock completes and it holds the lock. Hence when it unlocks, it puts the status of its position as 1, as it no longer holds the lock. It then notifies another waiting thread, i.e. the next one in the array / queue, by setting its my_element to be 0. Hence the next thread waiting for the lock will get it and so on.

We need a atomic lock (via atomic_circ_next) since there can be multiple lock calls at the same time. However the implementation of unlock doesn't necessarily have to be atomic (via circ_next), as unlock is only called once, by the holding thread, and not multiple times simultaneously.

Although it is possible for multiple threads to erroneously unlock simultaneously, but in this case, the behavior is implementation defined.

Funky9000

If I understand this code correctly, this code has an array of "flags" indicating whether the thread has the lock (has lock is 0) or not (value 1). While the current thread executing has value 0 and the rest of the waiting threads 1, and when it finishes, it sets back its flag to 1 (I'm done so release myself) and increments the next element in the array (sets the next waiting thread to start executing). Also, as @kayvonf mentioned, the circular increment will eventually start back at index 0 (after running all possible number of threads), restarting the process again.

Allerrors

So this method is similar to the previous one, except the 'ticket' will never over flow?

eknight7

@lol: "Hence when it unlocks, it puts the status of its position as 0, as it no longer holds the lock". I think you are referring to l->status[myelement] = 1 here, so the status should set to 1 and not 0 right?

Also, I don't get why in unlock, we do l->status[circ_next(myelement)] = 0? Why do we not simply do void unlock(lock *l) { l->status[myelement] = 0; }?

lol

@eknight7 Fixed the typo. If the status of the thread's myelement is 0, it holds (or will hold) the lock. If it is 1, it does not hold the lock. So for the thread calling unlock, we need to set its status (l->status[my_element]) to be 1. Now it does not hold the lock.

Now suppose it does nothing after this. Then then states of l->status[other elements] have not changed. Since they are all waiting for the lock, it must be that l->status[other elements] are all 1. Hence the thread which performs unlock has to notify some other thread to move on. In this case, we notify the next one waiting in this circular array / queue.

Also, suppose that there is no other threads waiting for the lock when the current thread unlocks. Then in the future, any thread wanting the lock needs to be able to acquire it immediately. So when we set status[circ->next(element)] = 0, when the next thread arrives, it gets position circ->next[head] = circ->next[element], which is the position the last thread set to 0. It can then proceed.

eknight7

@lol: I think whats confusing me now is that for some lock implementations we use 0 to denote that the lock is free and for some we use 0 to denote that lock is NOT free.

For this array-based lock, since the thread waiting for the lock is blocked when (l->status[myelement] == 1) (so 0:lock free, 1:lock NOT free), when a thread is unlocking its lock, it should set l->status[my_element] to 0 right?

lol

@eknight7 Since its an array based lock, whether the lock is locked / unlocked isn't dependant on a single variable, rather on the state of the array, and maintaining an invariant. I think your confusion is that you are using lock free/ lock NOT free as states for a single variable implemetation. I think the interpretation should be that for each thread, either it (can or will) possess (this holds when its status is 0, and all other statuses are 1), or it does not have the lock (its status is 1, and exactly one other status is 0). It is subtly different.

So when a thread holds the lock, its status is 0. But all other threads' statuses are 1. So they all block on while(status = 1). When the thread unlocks, it sets its status to 1. Hence it no longer has the lock. Now at this point, for all threads, the status is 1. Then the unlocking thread sets the next's threads status to be 0. Hence the next thread unblocks on the while(status = 1) and now possesses the lock.

aeu

Why do we need the atomic increment? Since one thread does some work and then essentially signals the other one to continue, couldn't we just go by with setting the next array value in the regular way?

CaptainBlueBear

@aeu the atomic increment is only for when acquiring the lock. And this is because multiple threads might be trying to acquire the lock at the same time.

You're right that we don't need (and don't use) an atomic increment when unlocking since only the thread that currently has the lock will be unlocking. (circ_next() is just for making sure we don't go out of the array bounds)

bojianh

One thing to take note that there are only P processors, so the maximum number of threads contending for the lock is P. However, I wonder what happen when the OS take threads when they are holding the lock.

yangwu

my understanding is status is an array keeping status (0/1) for every processor, and header is the next index for new processor. but the comments for status array "padded to keep off same cache line" confuses me.

one reason could be to avoid cache line invalidations during updates, but what happens if number of process P is greater than a cache line?

bysreg

@Allerrors i think the difference between this and the ticket, aside from the impossible overflow, are also, for the ticket one, every proc read and write the same variable. So if one proc writes to the var on unlock, it will make all the other proc to drop their cache line for tht var. But in here, because every proc has their own copy of status, and it is also padded so that the copies are all im different cache line, the other proc, except the proc that holds the next element, dont need to drop their cache line

qqkk

Here's my understanding: Head points to the next empty slot. If a thread wants the lock, it registers itself on the empty slot that head points to, and increases the head. Then the thread will waiting on the slot it registered, until the slot value becomes 0.(So head is actually pointing to the beginning of empty slot, or the end of queued threads). When the thread that holds the previous slot finishes, it will set next slot to zero, so that the thread waiting on next slot can enter the critical section.

qqkk

We may also need padding to the array, because ideally we hope each thread is holding a different cache line that contains the variable in the array, so that modification to a array slot won't invalidate the cache line that other threads are holding.

Richard

I think the initialization of status[P] should be like: one of the P values is zero, and others are one. Is that right?

Renegade

To me, this lock strategy works like a "zero" gets circularly passed across the array. And whenever a thread tries to grab the lock, it has to wait for "zero" arriving to its position. And whenever a thread releases the lock, "zero" passed to its next. Atomic operations ensure the position of each thread is unique, and therefore makes it a solid solution for both correctness and performance, i.e. O(1) interconnection traffic per lease.

However, this is a trade-off between coherence overhead and space utilization.

cyl

I think the drawback of array lock would is that it would cause unbalanced work load, because it is basically assign the processors with the lock base on the time processor got the ticket. Lets say I have 8 tasks 4 processors?and even index processor are assigned to work heavier., it would cause the unbalance workload.

doodooloo

Each processor can switch among multiple threads to run while each thread tries to acquire the same lock. In this case, the array-based lock will not work since we do not know the upper bound of the number of threads.