Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Dracula08MS

For the dance-hall organization, I noticed that each processor has its own local cache. I am wondering does that mean each processor needs to check if the data in the cache is up-to-date? Since it is possible that other processors has been modifying the data but the cache does not reflect the change.

Khryl

For the dance-hall organization, will memory bandwidth benefit from multi-processor parallelism? If four processors are to load same amount of different data from the memory, is it faster doing it in parallel than just doing it one after another? Or the memory bus are always saturated, so there is no difference?

PandaX

@Dracula08MS Yes, I think this is why we have volatile type qualifiers in C/C++.

zvonryan

The implementations are like switching fabrics of networks. Would the ideas and patterns in the area of switching apply in here? For example, would there be rearrangeably nonblocking networks like Benes network in the organization of processors and memory?

TanXiaoFengSheng

@Dracula08MS, I think the memory coherency is dependent on the consistency model it implements, which is more like a distributed environment.

vincom2

@Dracula08MS you might want to look into the MESI protocol, variants of which I believe are used by most multicore processors for caching.

Araina

Add some details.

Bus: A bus is a shared interconnect used for connecting multiple components of a computer on a single chip or across multiple chips. Connected entities either place signals on the bus or listen to signals being transmitted on the bus, but signals from only one entity at a time can be transported by the bus at any given time. Buses are popular communication media for broadcasts in computer systems.

Crossbar: A crossbar is a non-blocking switching element with N inputs and M outputs used for connecting multiple components of a computer where, typically, N = M. The crossbar can simultaneously transport signals on any of the N inputs to any of the M outputs as long as multiple signals do not compete for the same input or output port. Crossbars are commonly used as basic switching elements in switched-media network routers.

ote

Just a clarification about volatile type qualifiers. I was reading about them here, and it says that "The volatile type qualifier declares an item whose value can legitimately be changed by something beyond the control of the program in which it appears, such as a concurrently executing thread." So in the case of the dance hall organization, where each processor has its own cache, a volatile variable in the cache could be modified by a program from a completely different processor?

ArbitorOfTheFountain

In general when it comes to L1 and L2 caches per processor, there is a cache coherence protocol, for example the MESI protocol https://en.wikipedia.org/wiki/MESI_protocol. This means that caches are guaranteed by hardware to represent up-to-date values whenever it "matters". In other words, the hardware exposes an interface with memory where the program can read and write without worrying about out-of-date values being present in cache. There are many protocols to achieve this guarantee, each with its own benefits and drawbacks.

bysreg

@PandaX i think it does not have to be volative. It could be any variable, like a global variable or static class variable, which multiple threads have write access too.

jpd

@PandaX the "volatile" keyword just disables certain optimizations that remove memory reads (ie. use registers instead of repeated memory accesses) -- this is useful mainly for interacting with memory-mapped buses and devices. It doesn't necessarily affect cache semantics, since the cache is a processor feature, rather than a feature of the language. Although there are ways to bypass the cache (there's a specific x86 instruction, and a gcc intrinsic for it), volatile does not do that.