Previous | Next --- Slide 43 of 79
Back to Lecture Thumbnails

I don't understand why the increase in bandwidth does not necessarily lead to decrease of latency. For example, I have 1KB data and 1KB/s bandwidth, then the latency is 1s. When the bandwidth is increased to 1MB/s, then shouldn't the latency drop to 0.001s?


I believe latency would be the amount of time it takes for each byte to be transferred. Bandwidth is the measure of how many bytes you can transfer in a given amount of time. From the example in your questions, with just the given amount of information you can not calculate latency. The latency can be 1 second, meaning that each byte transfer takes 1 second. No matter if you can transfer 1, 1,000 or 1,000,000 bytes simultaneously, each one alone would take 1s.


Conversely, I'm curious why decreasing latency doesn't increase bandwidth. I had this impression, but I believe Kayvon stated otherwise.

Since bandwidth is a ratio of size to time, wouldn't reducing time increase the ratio overall?


@arcticx I think your argument is correct for a single fixed task (like moving 1KB data), but bandwidth and latency are two different metrics.

Like the example of "moving paper" gave out by professor. When professor wants to increase bandwidth, he can go to TAs first and then carry more paper at one time. The time he go to TAs will increase the latency, but since he carry a lot more paper, the bandwidth also increases. One key difference between this example and yours is that the total workload in "moving paper" is not fixed. Professor increases bandwidth by carrying a lot more paper (much more data than 1KB as in your scenario).


@arcticx I think of latency as a measure of the time delay to receive the data. For example, say you wanted to send a very small amount of data from Pittsburgh to San Francisco. Even if you hypothetically had infinite network bandwidth, you cannot transmit data faster than the speed of light, so latency will not necessarily decrease.


@arcticx, Using the analogy in the class, latency is like the speed of cars and bandwidth is the total number of cars passing the road in a period of time. We can increasing bandwidth by increasing the speed of cars so that more cars can pass in a certain amount of time, or we can do it by widening the road so that more cars can pass simultaneously. Some times, cars cannot run side by side, so bandwidth also depends on how much we can utilize the lanes. To me, latency is more like a factor of bandwidth instead of the consequence of bandwidth.


I found an example about latency and bandwidth here about internet connections. It says that high latency would correspond to a webpage loading slowly as a whole. So having a high latency and high bandwidth would mean there's a delay before the webpage starts downloading, but once it does, all the content shows up immediately. In contrast, a low bandwidth in this case would mean that once the webpage downloads, the webpage's content, such as images, would only show up one-by-one and it would take a while to load completely. A low latency would mean the webpage starts downloading very quickly. Would this example be a correct representation of the difference between latency and bandwidth?


@0xc0ffee Consider caches. The latency improves, but bandwidth doesn't decrease. In this case, the data is put into the L1 cache so that the data doesn't need to be retrieved again. However, the rate of data transfer (bandwidth) stays constant, but the second time we access the data, it is much faster even though the bandwidth doesn't actually change.


So from what I understand, if a program is requesting a lot of memory, ie more than the bandwidth, at a single point in time, the memory latency would increase at that point in time for some of the memory requests. To reduce the latency of these requests, we could increase our bandwidth, but this only decreases the latency of those aforementioned memory requests down to the minimum latency of the other requests. Therefore, a low bandwidth can increase average latency in practice. Is this correct?


Here's another example: computer memory. Your DDR3 memory comes in several variants, DDR3-2133,2400,2800 etc. They also have certain timings associated with them, CAS, RAS, etc. They pretty much all have the same timings regardless of speed. (Some slower PCIe will have faster timings, but that's about it).

Let's consider DDR3-1600 vs DDR3-3200. Both will take 10ns before you see the first word of your data. The difference here is DDR3-3200 has twice the transfer rate of DDR3-1600. So you would be done getting all your data in "half" the time, but the latency until you saw your first piece of data was still 10ns.

The latency is the same, even though the bandwidth has doubled.



I think it is best to consider latency and bandwidth as properties of the physical memory system. So once a request can be serviced by the memory system the result will be returned in x time where x is the latency. Depending on the memory system we might be able to submit a request while the memory system is saturated by placing it into some buffer. However, from the time the memory system can service a request to the time it returns the result remains at x time. But in general if we are making many memory requests, a higher bandwidth can reduce the time from when we want to submit a memory request to the time we get the result. The time we wait is the time for the memory system to become unsaturated + the latency.


I think I get the idea of Bandwidth vs. Latency, please correct me if I am wrong:

  1. Bandwidth is the maximum throughput of the system. This is a property of the system. This can alternatively be understood as the maximum bits (information) that can be transmitted per unit time (sec). If we think of a unit time slot as a carrier of information, then bandwidth is the capacity of this carrier.
  2. Latency is the time taken from the initiation of the memory request to the fulfilment of the request. This time is not only determined by the bandwidth (maximum possible throughput) but also the transmission overhead resulted from the media. That is, no matter how much information one wants to transmit, it needs to pay such overhead, and this overhead has nothing to do with the information transmitted.

To further elaborate this with the fetching paper analogy:

Only one person can walk on the hallway from room A to B at the same time, i.e. no 2 people can walk side by side. Each person can carry 10 piles of paper. The amount of paper a person can carry at the same time is the bandwidth.

There are some piles of paper in room B. Now a person in room A request 1k piles. A round trip from A to B takes 60 secs. If we send out 1 person, then the latency would be 6k secs, bandwidth is 10 piles/person. But if we send out 100 people, 1 per a second, then the latency would be 160 seconds, while the bandwidth is still 10 piles/person. If sending 1 person per second is the maximum frequency of sending people (otherwise the person start later will hit the person start earlier), then we can see that the actual bandwidth is 10 piles/second.


I think @randomthread has made a good point. Bandwidth and latency are properties of the memory system, and they do not depend on each other (opposing to some of the comments above). Latency is not the TOTAL time taken to complete any memory request. At the same time, we cannot find the TOTAL time taken by just looking at the bandwidth. Bandwidth and latency are just parts of the calculation, and they are nor related with each other.


3 ways to reduce latency: 1) Reduce the access time to each level of the storage hierarchy 2) Reduce the likelihood that data accesses will incur high latency 3) Reduce the frequency of potential high-latency events in the application