Isn't the first "when" better than the second "when"? Can anyone give an example showing it would be better off if memory could assume one of the caches will service request until snoop results are valid? It seems to me that memory should always prepare to respond.
Reading from memory is slower than reading from cache. Considering that all caches are on the interconnect and can hear BusRdX requests, any cache that has a copy of the most recent data should send that line to the requestor cache. This will probably be faster than memory responding to the request and then stopping if a snoop says it can help.
I think there might also be the case that a cache has written to the data and the data in memory is stale. If memory responses to a request immediately, the request may get stale data? (I think if dirty bit is 1, modified data should first be flushed into memory.)
There's also the problem of clogging memory bandwidth. If you're initiating a memory read for each snoop, but the correct result is already present in a cache somewhere, you're taking up valuable bandwidth that might be needed elsewhere.