I understand that x is taking up 4 bytes because it's an int and it's written in memory as 1000 instead of 0001 because of endianness of the system but why is it beginning at an offset instead of byte 0 of the line? Does this have to do with looking at some of the lower level bits of x's address and using them somehow to get an offset into the line?
Yes, that's right. Each cache line stores more than a byte, so we'll actually fetch all the data in the 64 byte block containing x and store it in the cache.
Looking at x's address, we figure out where to find it in the cache by breaking it down into the cache index, tag, and block offset. For example, if each b is a bit of x, we can break it down as follows:
x = bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
| tag | | index || off |
The index tells us where in the cache to look, the tag whether we have a match, and the offset how far to look inside the block. Here the offset for x would be 4.
Write-back and write-through are cache policies for write hits. A write-through cache will immediately write to main memory when a cache line is updated. A write-back cache will only write to main memory when the cache is flushed. This is more efficient because the write-back cache will absorb a sequence of writes to the same address, and will only write the last value to memory whenever the cache line is evicted.
Write-allocate and write-no-allocate are cache policies for write misses. A write-allocate cache will load data into cache and then perform a write-hit action. A write-no-allocate cache will write directly to main memory.
Dirty bit: Used for a write-back cache policy. Marks whether the cache line has been changed since being loaded. If it's marked, then when the cache line is evicted, its value will be written to the next lower level in the memory hierarchy.
Tag: Uniquely identifies the block stored in the cache line.
I had a realization in lecture today: In assignment 1, problem 5 (BLAS saxpy), the reason why the bandwidth consumed by the program is TOTAL_BYTES = 4 * N * sizeof(float);, even though there were 2 reads and 1 write (ie. 3 accesses) was because of the load-on-write cache behavior. So the one write in code actually translates to one read from the cache, then one write (immediately, in the case of write-through, or when the cache line is flushed, in the case of write-back).
TOTAL_BYTES = 4 * N * sizeof(float);
We could thus speed up saxpy pretty significantly by somehow telling the CPU to do a direct write to memory, without first reading the line into cache.
In a write-back cache, if data value is changed multiple times, but at eviction, it is back to the original value it was loaded into cache as, will it be recognized as ultimately not being changed?
In a direct mapped cache, a cache contains several sets. And each set contains exactly one cache line.
Given an address of a word, we interpret it as |tag||set index||block offset|. Then perform the following steps (assuming it is a cache hit):
There are two other kinds of cache organization:
Set Associative Cache, where each set contains multiple cache lines.
Fully Associative Cache, where one single set contains all the cache lines.
See more in CSAPP.
what about the cache line state ? is it referring to the MESI (shared, modified, invalid, exclusive) protocol state?
@PandaX which one is more favorable? Set Associative Cache or Fully Associative Cache?
It seems that the disadvantage for set associative and fully associative caches vs. direct mapped caches is in the line matching step. In direct mapped caches, since there is only one cache line per set, you would just check if that one line matches the tag from the memory you're looking for. In the associative caches, you have more lines to check through. Fully associative caches would have a whole cache's worth of lines to check through, set associative caches a fewer amount, and directly mapped caches just one line.
However, the advantage of having more than one line per set is that you have fewer evictions if accessing multiple lines with the same set index.
Question: do fully associative caches just omit the set index, since everything is in the same set?
@traveling_saleswoman I think there is no index bit in the fully associative caches. Suppose one cache line has M(Power of 2) bytes data, then lg(M) bits are used for offset and all the other bits are used for tag.
You can refer to this picture.
@Hil That is a beautiful picture. From now on, I will now remember my caches as dragons' bookshelves. Thank you very much for the link.
If we can write to memory from cache in a nonblocking fashion, (aside from memory bus congestion) why not have write-through cache? Why not write something to memory in the background?
@aeu what do you mean by background?
@krombopulos_michael I believe it will still be recognized as being changed as there is no way to determine what its previous value is ( you will have to check with memory) so it would be easier to just update the value although it might still be the same.
@aeu If it is the case that write-through is nonblocking, another drawback would be power consumption.
I'm not familiar with the mechanics of writing through to memory. Could somebody chime in on the specifics? Is it a one-way interaction that leaves the cache available for use after the message sends, as @aeu implies, or is it a two-way communication?
If it is a two-way communication, then another drawback of write-through is the time when that cache line cannot be used while it is communicating with memory.
Is it possible to have a write through and write allocate cache? How would that work? Would it write the value in memory and then load it into cache again?
@huehue. Sure. It would simple load the line into the cache (just like a write-back cache) but also update memory.
Write-through and write-back are policies for when there is a write hit in a cache.
Write-through cache, on a hit, directs write onto cache and through to underlying permanent storage before confirming write completion to the host. This ensures data updates are safely stored but has the disadvantage that the write still experiences latency based on writing to that storage. Write-through cache is good for applications that write and then re-read data frequently as data is stored in cache and results in low read latency.
Write-back cache is where, on a write hit, write is directed to cache and completion is immediately confirmed to the host. This results in low latency and high throughput for write-intensive applications, but there is data availability exposure risk because the only copy of the written data is in cache. The disadvantage is that there might not be enough protection of data as data is exposed until it is staged to external storage.
@bysreg: Were you able to find out the answer? If yes, please post! Thanks.