Slide View : Parallel Computer Architecture and Programming : 15-418/618 Fall 2016

Previous | Next --- Slide 25 of 37

carnegieigenrac

Why does Intel architecture allow the order of writes before reads to be switched, but not other combinations of operations? If fences are used appropriately, wouldn't it be okay for all these constraints on memory order to be relaxed?

BBB

I would imagine because it's cheap to maintain the other orderings.

Reads are easy to keep in order, since the processor that generates the read has to wait for the read to finish before graduating any other memory accesses anyway.

Writes are more expensive to maintain orderings, since processors can fire them off and continue execution before they complete.

So then why does Intel keep write -> write ordered? I would guess it makes the software easier and there's demand for the feature.

tcm

Good questions. From a performance point of view, Intel (or any other processor company) would prefer to relax the constraints as much as possible to achieve higher performance. The problem, however, is the large amount of legacy x86 code that was not written with proper memory fences, plus the fact that many programmers struggle to insert fences correctly. TSO is a very conservative approach in terms of correctness (it breaks relatively little code compared with the more aggressive options), but it does allow write buffering to occur.

Regarding why Intel doesn't relax write->write ordering, one quick answer is that it will prevent the example on slide 18 from breaking (although it does not prevent the example on Slide 19 from breaking).

nba16235

A question I have in mind after reviewing this chapter for the exam:

Does this lecture try to convince us why relaxing the memory consistency is necessary for processor design? I'm a little bit confused about the theme.

tcm

It's not so much about convincing you of this, but rather it is about explaining why it is that relaxing memory consistency is extremely important for achieving reasonable performance, and therefore that is why all processors support something weaker than sequential consistency. Otherwise, one might wonder why machines aren't simply sequentially consistent, since that would certainly be most convenient for programmers.