Slide View : Parallel Computer Architecture and Programming : 15-418/618 Fall 2016

A Modern Multi-Core Processor: Forms of Parallelism + Understanding Latency and BW

Previous | Next --- Slide 15 of 79

bob_sacamano

Why can't each core be made "fancier" with branch predictors and out-of-order buffers at the cost of slightly larger chip area (considering that transistors are still getting smaller)?

tcm

That is an option, certainly, and the pictures on the slides probably over-simplified the tradeoffs. Notice that it says "fancy branch predictor" on slide 13; even lean cores have some type of branch predictor. If you look through the branch prediction literature, you will notice that like many other predictors, they get quite a bit larger as you try to squeeze out more precision. You can build a relatively small predictor if your goal is 90% accuracy; if your goal is 98% accuracy, however, then you are looking at a large amount of hardware.

A key metric for processor designers is performance benefit per unit area. If I spend X% of my die building a larger predictor, how much will that improve performance relative to being able to increase the core count by X%. You don't get linear speedup with more cores, so you need to weigh the advantage of additional cores. There isn't a simple formula for this, and different design groups have prioritized different things in their designs, but that is roughly how the thought process goes.

If you are interested in reading further, you might want to contrast these two articles that appeared in the same issue of the same journal back in 1997:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.472.7021&rep=rep1&type=pdf

http://www.inf.ed.ac.uk/teaching/courses/pa/Papers/billion_chipmultiprocessor.pdf