Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

russt17

Just thought I'd clarify the two big points cause they took me a second to understand. Here's my interpretation, please correct if wrong:

When we block in a thread, it allows the OS to take it out of it's execution context on its core and swap another thread in. Blocking has nothing to do with the core changing execution contexts.

The first point is saying that when we're doing a performance-critical program we usually dont have more worker threads than there are execution contexts (adding more will mean more time spent for OS to swap threads onto execution contexts). So since there is no lack of execution contexts blocking won't really help anything anyway.

ESINNG

I'm still confused about when to use busy waiting and when to use blocking. Can anyone give me a concrete example in real world about it?

HingOn

@ESINNG say there are only 2 processes sharing the same resource in a system, and the critical section is only one line of arithmetic instruction. So, busy waiting is better in this case since the overhead of context switch is larger than just staying in a while loop for a few cycles. If there are many processes sharing the same resource or/and critical section is large, the accumulated busy waiting time for some process can be much longer than the overhead of context switch, and so blocking would be better.

squashme

Are you sure that the length of the critical section matters? I think what's important is how long you expect to be waiting before entering the critical section. Busy waiting would be bad even for a one-line critical section if you expected it to be a long while before you'd acquire the resource, since in that case you would be wasting processing time.

As I understand it, an example in which busy waiting would be better would be if you had two threads that both wanted access to a boolean variable using a mutex. If each thread only acquires the mutex to write to the variable and then immediately release the mutex, we would expect that the amount of time either thread waits for the mutex is very small, so busy waiting would be better. However, if both processes were taking turns writing large amounts of data to a file, blocking might be better.

ak47

I don't know enough to resolve this debate, but it seems like the length of the critical section would matter. Consider we have have two identical programs A and B:

while(1) acquire(spinlock) do a bunch of arithmetic release(spinlock) end-while

While A is doing arithmetic, where is B? Doesn't B need to check if it has access to the spinlock?

plymouth

@ak47 B would be constantly spinning checking to see it if has the lock and using all the resources of the processor it's running on. However, this is a bad program to run on two processors, because none of the code is paralyzable.

Despite its drawback, spin locking is is very common because it's a lot faster than a blocking mutex on a multiprocessor system. In the Linux kernel, spinlocks are the fundimental locking type (https://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/c93.html).

cube

@ak47 To rephrase: B would check if it has access to the spinlock - in fact, it would do so repeatedly, thus making the arithmetic on A take even longer.

BigFish

I read through all the comment, but still cannot get the point. Can someone give another example to illustrate this problem? Thank you!

rflood

Here is an example we ran into in our assignment actually, although its less about other machine processes as it is same-program resources

We had 'tasks' which had a done variable, and a param variable, we then spun until our 'done' was something we expected. The issue was it was it was small enough to fit on one cache line, so reading it in a spin was causing massive cache issues for different threads who wanted to read and then write it.

edit screw writing code like this, see this gist https://gist.github.com/floodric/91e0195b12dc9b21a6ea

BigFish

I found a great explanation on stackoverflow

aznshodan

Quick Question: Do OS schedulers actually know that in certain cases busy-waiting is preferable to blocking?

plymouth

OS schedulers don't know anything about the critical resource or when the mutex is going to be released. They count on the programmer to make the correct decision of which type of lock to use.

landuo

I think the idea here is that blocking lock can be expensive because of instructions of making a thread sleep and waking it up later on. In general cases of multi-core execution where each lock is held for a relatively short period of time, spin lock is preferable even though it wastes a portion of CPU time. In real word, choosing between spin lock and mutex can be subtle. However, modern systems now support hybrid locking strategy for mutex. That is, the thread that tries to acquire the lock will first spin for a short period of time and then "backs off" to a blocking state.

jcarchi

@BigFish Thanks! that stack overflow link should be included in the slide.

haodongl

@landuo good explanation thanks!

apoms

MPI is a concrete example of software choosing to always busy wait over blocking. One might think that this would be inefficient but it's the default because we generally own the machine.

MPI is generally used in high-performance super computer environments where the running program is the only intensive program running. In this situation, it makes sense to allocate a number of processes exactly matching the number of execution units. Choosing to block in this scenario wouldn't make sense because there is no other process waiting to run and we simply incur the overhead of a context switch in and out for the waiting process.