Slide View : 15-418/618 Spring 2014

Previous | Next --- Slide 39 of 56

Dave

Would this method be less efficient from the processor's perspective than a Barrier? A while loop like that kinda makes me cringe...

This comment was marked helpful 0 times.

kayvonf

@Dave: What specifically makes you cringe? Or in other words, how would you implement barrier(2) in the following code to not make yourself cringe?

Thread 0:

// do stuff to produce x 
x = 1;
barrier(2);

Thread 1:

// do stuff here
barrier(2);
print x;

This comment was marked helpful 0 times.

Dave

@kayvonf I guess that was really my question, whether barrier is implemented using something other than loops. My reaction to a loop like that may just be an artifact of my Proxylab days, but having to perform millions of checks a second to see if a value has changed seems inefficient. It doesn't sound like such a thing exists, but I was wondering whether modern CPUs might have some gating feature that shuts down a core until the barrier condition is met. Am I wrong about a while(flag==0) loop being taxing on a CPU? Or about how CPUs work? Or both?

This comment was marked helpful 0 times.

kayvonf

You are completely correct, however...

You're taught it is not good to busy wait in other classes because the assumption is that some other process might want the core, and thus the spinning process should be swapped out until the condition it is waiting for is met. However, if you are worried about performance, it's unlikely you're multi-tasking on the machine. If that's the case, and there's nothing else for the core to do, there's potentially no performance loss from spinning. In fact, swapping T1 out so the core sits idle could negatively impact performance since once the flag is set it, it could take much longer for T1 to get placed on the CPU again, increasing the overall time between the flag being set and T1 observing the update.

That said...

You might not want to busy-wait because busy waiting burns power.
You might not want to busy-wait in a multi-threaded CPU (e.g., consider a Hyper-threaded situation) because you are burning instruction issue slots that could be used to execute instructions from other hardware threads.
In light of 1 and 2 above, you might be interested in the implementation of the x86 mwait and monitor instructions. Together they can be used to tell the core to sleep the execution context until a specific condition has occurred, such as a write to a specified memory address. Sounds pretty useful, eh? (Note all this is handled without OS intervention, so the thread is never swapped out of the hardware execution context it is currently assigned to.)

mwait: http://x86.renejeschke.de/html/file_module_x86_id_215.html
monitor: http://x86.renejeschke.de/html/file_module_x86_id_175.html
A nice reference from Intel about using monitor and mwait.

This comment was marked helpful 3 times.

tchitten

A quick summary of the monitor and mwait instruction: The monitor instruction is used to define a memory range starting at the address specified in eax. The width of the monitored area is on a per CPU basis and can be checked with the cpuid instruction. Once the monitor instruction has been executed, the mwait instruction can then be used to enter an "implementation dependent optimized state" and execution will not resume until a write to the monitored address range or an interrupt occurs. Various other parameters to both mwait and monitor can be passed in other registers.

This comment was marked helpful 2 times.