Previous | Next --- Slide 42 of 65
Back to Lecture Thumbnails
arjunh

I don't think we discussed branch prediction in this lecture, but it's worth a quick overview.

The idea behind branch prediction is quite similar to the one behind pre-fetching; most programs have a predictable pattern of data access/conditional execution.

For example, a program that accesses data from an array usually does so in a linear fashion (from index 0 to , with increments of size 1). A program that instead does so in a near-random fashion would not be able to exploit the benefits that prefetching mechanisms provide.

Branch prediction works in the following fashion:

  1. The processor observes a branch impending by looking forward in the instruction stream (processors are able to fetch/decode multiple instructions simultaneously).
  2. After predicting which branch will be taken, the processor starts to fetch/decode instructions where it predicts the branch will go and will start to execute those instructions).
  3. When the branch is reached, the prediction is checked. If it was correct, the processor continues execution on the appropriate branch. If not, the processor has to reset the state (discard the results evaluated on the incorrect branch) and start fetching/decoding instructions on the correct branch.

In essence, this notion of 'discarding incorrect branch results' is quite similar to how incorrect branch results in a SIMD model of execution are discarded.

A rather famous example of how branch-prediction can lead to a significant speed-up (or slow-down, depending on how the code is written) can be found here (which incidentally also happens to feature the most voted up response on StackOverflow).

kayvonf

Modern CPUs have hardware prefetchers that analyze the pattern of loads and stores a program makes in order to predict what addresses might be loaded in the future. Like branch prediction, or out-of-order instruction execution, this is done automatically and opaquely by the hardware without the knowledge of software. Earlier in this lecture I referred to these components of a processor by the technical term "stuff that makes a single instruction stream run faster".

However, most instruction sets (including IA64) do include prefetch instructions that can be emitted by a compiler to help deliver greater performance. These instructions are taken as "hints" by the processor. (It may choose to ignore them since remove a prefetch will not change program correctness.)