Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Richard

I don't really see a substantial difference between atomic and lock. Yes, they are different since we can specify lock(x) and lock(y) so they won't influence each other while atomic{} cannot avoid this. But this doesn't seem substantial... Can anyone explain that?

thomasts

An operation is atomic if it appears to the system to happen all at once. Having a lock on an operation means that only the thread with the lock can perform the operation at one time. They're slightly different ideas.

0xc0ffee

@Richard I think it's easiest to start with the semantics for 'atomic,' and then consider which of these behaviors locking also meets.

Here, the atomic keyword means the block will be: (1) all-or-nothing, (2) local-only-until-committed, (3) capable-of-being-serialized.

Which of these three do locks implement in its own right?

Richard

@0xc0ffee I think you are giving a good induction. Could you explain what "all-or-nothing" means here? I guess at least (3) is also met by locking.

0xc0ffee

@Richard all-or-nothing is just a rephrasing of atomicity :)

Yeah - I think I agree. I think locking is only guaranteed to provide (3). (1) and (2) don't necassarily happen - locking certainly isn't all-or-nothing and it's certainly not isolated - other threads can observe the effects halfway through.

Richard

@0xc0ffee I see! If one thread steps into atomic{}, all other threads in the program will pause working. However, locking can only prevent another thread from entering the same locking area.

0xc0ffee

@Richard - it's not quite that all other threads will pause their work. I think you might be misunderstanding what 'isolated' means. Slide 39 has a pretty good description :)

Richard

@0xc0ffee I find I misunderstood atomicity... Atomicity demonstrates the instructions within atomic{} are treated as a whole: either none of them is executed, or all of them are executed. To realize atomicity, lock() can be one solution by locking all relevant codes that may cause contention. However, there are some better (faster) solutions, such as "undo" the instructions if contention happens.

Basically, lock can be used for atomicity purpose, but can also be used for other purposes. Atomicity can be realized using lock, but can also be realized by other methods. Also, a locked area isn't guaranteed to be atomic, unless you carefully avoid any unlocked instruction conflicting with the locked area.

totofufu

@thomasts Don't those two properties (happening all at once and only having one thread perform the operations at a time) imply each other though?

Also, what other uses does lock() have? Anytime the lock/unlock pair are used, couldn't we technically call that atomicity?

Richard

@totofufu Let me try answering this. For example, if you lock all instructions relevant to writing to address p, but not lock those instructions which read address p, then the read instructions can be executed while another thread is executing the locked part. However, if you define the locked part as atomic (rather than using lock() and unlock()), then if thread0 is executing the atomic area, and thread1 is reading address p, it will either get the value before thread0 steps in the atomic area, or get the value after thread0 finishes the atomic area, but won't get intermediate values.

aeu

We talked in lecture about how critical/atomic regions can become lock-free in certain scenarios (like the program is actually single-threaded and no concurrent access can actually happen). Apart from these static (or compile-time) scenarios, there might be cases that can alleviate the need for an atomic region that may cause locking behavior that only occur during runtime. Does OpenMP (or other libraries) detect these runtime scenarios in order to make these choices of not locking a critical section? I would assume that it doesn't because that would be expensive to detect, maybe as expensive as just using fine-grained locking where applicable. I've looked online but have been unable to find an answer to it.

yimmyz

To summarize what's been discussed above and extend a bit, I will compare and contrast the difference between 'atomic' and 'critical' primitives in OpenMP. An example I will consider is the following line:

#pragma omp atomic/critical
count++;

Ultimately, the effect on "count" is the same by using either "atomic" or "critical." However, an "atomic" addition is 25x more expensive than normal addition, and a "critical" addition is 200x more expensive. Following is the reason:

"atomic" operation relies on the hardward to provide it directly, so naturally it only includes a few (e.g. increments). Also, since it's hardware-dependent, it might not be portable to certain platforms. However, it is really cheap, and should be preferred to "critical" when possible.
"critical" operations protect a general sequence of code, and uses a much heavier-weight lock to achieve this.

In relevance to this slide, notice that the "atomic" operation can also be implemented with a lock, but that would defeat the purpose really. Also, the use of "atomic" and "critical" does not guarantee the program's correctness if they are used incorrectly.

althalus

@Richard, thanks for your earlier comment on how lock can be used for atomicity which helped me understand. That is true, but by itself lock does not have the property of atomicity. It also does not have the property of isolation (local until committed) which is easy for atomic instructions.

Richard

@althalus You are right. Just re-edited that. Thanks.

krillfish

If "critical" is around 8 times more expensive than "atomic," then why ever use "critical"? I'm somewhat confused by that. They do basically the same thing, but one incurs much greater overhead than the other. Or is it that the "relies on the hardware to provide it directly" part only works for certain operations, so only some things can be made atomic?

yimmyz

@krillfish Well, the reason is that "critical" applies to much more general use cases.

"atomic" only allows the few expressions as listed here: https://msdn.microsoft.com/en-us/library/5fhhcxk3.aspx. The reason why they are so limited is that they are directly mapped to hardware level instructions (if possible).

On the other hand, "critical" can guarantee the mutual exclusion of, essentially, any block of code, and a common implementation is to use a mutex. A mutex supports lock() and unlock() functions, which can in turn be implemented with hardware level concurrency primitives. However, this incurs a much heavier overhead compared to "atomic".

krillfish

Okay, that makes a lot more sense. Since "atomic" is limited to only a few operations, that means the hardware can make more specific decisions on how to deal with that contention. But for "critical," the hardware can't make any assumptions, so the overhead will be much greater. Thanks! :)

bojianh

Lock() can also be used for I/O blocking in which "atomic" cannot achieve I believe (but I guess this can be better implemented using condvars).

ok

It is important to note that atomicity can be implemented with lock/unlock, but is not the only way for it to be acheived