Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Previous | Next --- Slide 13 of 69

TDollasAKATMoneys

Can somebody elaborate more on why latency hiding cannot help with this problem? Is it because if latency hiding incorrectly predicts the data the processors is requesting, it would consume more of the already limited bandwidth?

yuel1

Think of it this way. You(memory) and your friend(processor) are separated by some distance, and you're shouting instructions at him/her. Latency is the period of time it would take for the sound waves to travel from you to your friend. Bandwidth is the amount of instructions you can reasonably shout to your friend and still have him/her understand what to do.

Now suppose your friend can execute the instructions much faster than you can shout them. Even if we ignore the speed of sound (latency in this case), you would still be limited to how many instructions you can shout and your friend can understand (bandwidth).

Hope this helps :)

TDollasAKATMoneys

Thanks for the analogy! It makes a lot more sense now.

kayvonf

@yuel1: excellent analogy!

Kapteyn

Latency hiding is when the processor does work on data that it already has while waiting for the data it needs in order to perform some other work to be sent from memory.

In the first quiz we saw that context switching between two threads allowed us to never have an idle processing unit because in that situation latency and bandwidth were such that when we wanted to do some work that required data from main memory, we could issue a load, do some other work, and by the time we finished that work, the data that we requested earlier will have already arrived from main memory. This is not the case when we are bandwidth bound.

Say we have X processing units that can each process 1 unit of data faster than the bus can send X units of data to them. This means we have low arithmetic intensity relative to our bandwidth limit. Each unit can switch between two execution contexts.

In this case we will always have some idle units because even if the instruction to load a unit of data for a work step from context 2 was sent right before the processor began processing a work step from context 1, the current work step will finish before the next unit of data can get to it. So no amount of context switching/latency hiding can prevent us from having some idle units. Of course context switching is still better than no context switching at all but due to the fact that we are bandwidth limited, we will still have periods of time when some units are idle.