Previous | Next --- Slide 47 of 65
Back to Lecture Thumbnails
mschervi

This diagram shows how multi-threading can be used to hide stalls. When thread 1 stalls while it waits for a read from memory, the processor can switch to thread 2 (etc...). In this example, by the time thread 4 is run and stalls, thread 1's request has returned and it is runnable. In this way the processor is always able to be doing useful work (except for maybe the overhead of context switching?) even though individual threads are stalled.

nkindberg

Due to executing the other three threads until they blocked, thread 1 actually took longer in wall-clock time to finish executing than if we just waited on its stall since other threads were still executing when it became ready. This also led to a brief discussion on scheduling fairness in class since in reality you would need a better thread scheduling system than just executing all other threads until they stall before going back to the first since it could potentially be quite a long time before that happens, or never happen at all if another thread never stalled.

vvallabh

Is there a way to estimate how long a thread will take before you run it? Or maybe even estimate how long each stall will take? To better optimize which order you run these threads in (for example if thread 1 took longer you might want to run it last rather than 1st). Or am I just missing something?

kayvonf

@vvallabh: great question! So good in fact that good answers to that question might get you published in a computer architecture conference.

miko

Does the processor only switch between threads whenever it runs into a stall? Correct me if I'm wrong, but I originally thought it was possible to switch between threads even without something that causes execution to stall.

mburman

@miko Before I try to answer this, I think it's important to note that we are talking about hardware threads here. If you look at the difference between slide 44 and slide 45, you'll notice that the execution context is replicated 4 times over in the latter. This implies you can store 4 independent execution contexts on that core (4 hardware threads).

With hardware threads, there is no "context switching". Context is stored directly on the core for each thread. Remember, when an OS thread is context switched out, its execution context is saved to memory and a different thread's context is restored - this takes a lot of cycles.

The answer to your question is on slide 52. Kayvon talks about it 62 minutes into lecture 2. It is basically down to implementation - you might run an instruction from a different execution context every cycle, or wait till a stall before you do.

I'm also going to add that, for OS threads, scheduling is handled by the OS' scheduler. The scheduler may choose to context switch from one thread to another at its discretion.

kayvonf

Good answer, @mburman. The only thing I'll add is that when we talk about contexts in this class, we now need to be precise about what context we are referring to. There's a hardware execution context (register state, PC, etc.) defined by the hardware architecture and there's a software process context defined by the OS (the hardware context, plus info in the operating system PCB, virtual memory mappings, open file descriptors, etc.). The OS is responsible for the mapping of OS processes (or OS threads) to hardware contexts. The chip is responsible for choosing which of it's hardware contexts to execute instructions from. An OS context switch changes the mapping of OS threads to hardware contexts and should be considered an expensive operation.