From the lecture, I got that this idea has significantly changed the design direction for processors, but I'm still wondering exactly how robust this change of direction has been. Whereas before multi-core most of the transistors on the chip were being used to optimize single instruction stream processing, is it now the case that no transistors are used for this, instead being used to pack in more cores? Or, have chip makers simply moved on to adding more cores IN ADDITION to all the circuitry for optimizing single instruction streams? Somewhere in the middle? Can anyone give (albeit rough) concrete numbers for how the transistor count for single instruction optimization vs. extra cores has changed?
@jmc Actually I think adding more cores and optimizing single instruction stream processing are two trends in today's processor design. Take GPU as an example, GPUs have a great number of cores and each core have a couple of ALUs and execution contexts. So I think the chip makers are optimizing in both aspects. This can be achieved since the density of transistors is still growing exponentially which enables manufacturers to put more parts into each core and more cores into each processor.
@jmc. Your intuition is correct. As you can see by looking at any modern processor description, caches, out-of-order execution logic, and branch predictors, etc. certainly still exist. But that is no longer the sole way to obtain higher processing throughput. Today, modern multi-core CPUs have multiple, simple cores (where by simple I'm referring to not maximized with all the bells and whistled needed to deliver maximum single threaded performance). Modern GPUs have far more cores, and those cores are even simpler than their CPU counterparts.
There's a continuum between the best single threaded performance, and the extreme end of throughput-oriented processor design. That's one of the big takeaways of this lecture. Multi-core CPUs and GPUs utilize the same concepts, but their architects make different decisions to build the best processor for the set of workloads they most care about.