The notion of "in program order" is interesting. Is it fair to say that the burden is on the programmer to write race-condition free software, but within each sequential program read after write and write after write must perform as expected?
I think that's a fair assumption. There's not much the hardware can do to prevent race conditions that the programmer introduces into the program. If the programmer reads A, which has 0 twice, and then writes A+1 into A twice after reading, A will have 1. However, the reads and the writes should do what is expected given the reads and writes.
I don't think there's any concept of a (software level) race condition to the hardware. For example, if the programmer writes code that doesn't properly lock a variable in a critical region and multiple threads are performing all sorts of interleaved reads and writes, as long as those reads see the latest write (even though it's a logical bug on the software level), the hardware is correctly providing its part of the contract to return the value of the latest rate to any read.