Previous | Next --- Slide 39 of 65
Back to Lecture Thumbnails
pd43

I am still a bit confused on the difference between memory latency and bandwidth. Going along with the laundry analogy from class, would it be correct to think of latency as the time we have to wait for the machine to finish after pressing the start button? Under this analogy, would bandwidth be the number of clothes that fit into the machine during one load? or the total number of clothes that can be washed in one hour?

I am confused about how latency and bandwidth are related. Maybe a pipe/water flow analogy would be helpful.

kayvonf

The latency of an operation is the time to complete the operation. Some examples:

  • The time to complete a load of laundry is one hour. The time between placing an order on Amazon and it arriving on your doorstep (with Amazon prime) is two days.
  • If you turn on a faucet (and you assuming the pipe connected to the faucet starts with no water in it) it's the time for water entering the far end of the pipe to arrive in your sink.
  • If I type ping www.google.com I get a round trip time of 30 ms, so I can estimate the latency of communicating between my computer and Google's servers is approximately 15 ms.
  • If there's no traffic on the roads, it takes 5 minutes to drive to Giant Eagle from my house, regardless of how wide the roads are.
  • If a processor requests data from memory it doesn't get the first bit back for 100's of clocks.

Bandwidth is a rate, typically stuff per unit time.

  • A bigger washer would increase your bandwidth (more clothes per hour)
  • A wider pipe would increase the flow into your sink (more gallons per second)
  • A multi-lane road increases bandwidth (cars per second)
  • A fast internet connection gives you higher bandwidth (100 gigabits per second)
  • Once memory receives a request, it responds by sending eight bytes per clock cycle over the memory bus.

Note in the above examples: there two ways you can increase the rate at which things get done: make the channel wider (get a bigger pipe, add lanes, a wider memory bus) and push stuff through it faster (increase water pressure, increase the speed limit, increase bus clock rate).

miko

@kayvonf Just to clarify, does the value of memory latency include the computation time itself, or is it solely composed of the travel time of information to and from the CPU?

To place this into context, in your example "If there's no traffic on the roads, it takes 5 minutes to drive to Giant Eagle from my house, regardless of how wide the roads are.", is the time you spent shopping in Giant Eagle included? Or would the latency be only made up of the 10 minutes you spent driving on the road?

dmatlack

@miko It sounds like it's only the time to get there, though I'm not entirely sure. See the example with pinging Google. If the round trip time is 30 ms to ping, then the latency is roughly 15 ms.

kayvonf

@miko. Latency is the time to complete an operation. The answer to your question depends on the precise operation one is asking the latency of.

For example, L1 cache latency might mean the number of cycles between the processor requesting a value as part of an instruction and that data arriving in the CPU's instruction pipeline for use. Analogously, you could define memory as the time between issuing the load instruction and the data arriving at the instruction pipeline for use (assuming the data was not in cache and had to be retrieved from memory).

However, you could also define memory latency as time to first bit of response from memory, or time until the load request (e.g., a cache line transfer) is fulfilled in the cache. The latter would include the time to first bit + the time to send one 64-byte cache line over the bus.

effective_latency_of_transfer = time_to_first_bit + size_of_transfer / link_bandwidth

If a friend asked for a glass of milk, and I had none, I'd say the latency of me getting him a glass of milk would be the time to drive to Giant Eagle, purchase the milk, and drive home.

In the Google example above, I mentioned the communication latency as the operation, and I interpreted that as the time to send data to Google. That's why I took the ping time of 30 ms and divided by two. If you cared about round-trip latency, then you'd use 30 ms.