Previous | Next --- Slide 35 of 41
Back to Lecture Thumbnails
tcz

For those of you keeping score at home, I think the optimization described introduces a problem where two processors may write to the same memory location, but they don't tell the otter processors until the write is complete.

As a concrete example, Jill may load memory address foo, then write 3 to foo in her local cache and put it back in global memory. Sometimes while this is happening, her buddy Jack loads foo into cache, increments it, and then tries to put it back. Finally, Jill remembers to tell everybody, "Hey y'all, I wrote 3 to foo" and assumes everybody knows what's what. Now Jack is confused - did he increment correctly? We'll never know!

So instead, we can use a write-back buffer. But why doesn't a write buffer have the same problem? It seems to me, based on the description in slide 31, that the effects of the write buffer only kick in when Jill's cache line containing foo is evicted. Wouldn't it be more important to ensure the invalidation message gets broadcast at an appropriate time?

Yuyang

@tcz

First and foremost, I LOVE otters!!!!!!!! otter processors are sooooo cute!!!!!!

And less importantly:

So I believe that the purpose of introducing write-back buffer is not to solve the problem mentioned in your first paragraph. The write-back buffer is just a separate optimization we made to hide memory latency. The use case for a write-back buffer is such:

Sometime ago, I acquired M state for the cache line X, I modified it, and it stays dirty in the cache line (because we are using write-back instead of write through). However, later on I need to access another cache line Y which conflicts with cache line X, so I must evict X and bring Y in. However, this is expensive, because we have two data transactions. So to cover that up, we bring the Y in, and put the dirty line X into the write-back buffer (until the data line is not so busy and we could flush X to memory).

Remember that at this point of time, no other processor has cache line X valid now because I have it in the M state. Now, if some processor want to busRd or busRdx X, I will check both my tag and state and also my write buffer. I will see that I have X in the modified state, so I will respond with "Hold on I have it dirty, you need to wait for me to flush it to memory", and then do the actual flushing.

This preserves the memory coherence and the scenario you describe would never happen for a line in the write back buffer. :D

idl

@yuyang so re: your example, how is putting the dirty line X into the write-back buffer and flushing to memory at a later time better than just immediately evicting and flushing X? My understanding is that if X is needed again even before it is flushed, that processor can just load it back without needing those extra data transactions, since it is already in the M state.

Yuyang

@idl: I guess that is one thing... But I think the major thing is that since a evict needs two data operations (load new data, write old data), which is very expansive, we might as well break it into two pieces and not occupy the data line for too long and potentially make all the data operations queue up by blocking the data line for too long. By splitting it up into two pieces the 'write old data' can be postponed until sometime later when the data line is not that busy, or when it is absolutely needed :D

jinsikl

@idl another reason to buffer was covered here. Basically, by buffering writes, the cache doesn't have to wait while memory is handling that write. This can lead to higher throughput. But as others have mentioned above, this optimization leads to complications in regard to maintaining coherence.