Slide View : Parallel Computer Architecture and Programming : Tsinghua Summer 2017

Previous | Next --- Slide 2 of 86

kayvonf

Question: I'd see to see a student attempt to answer the two "self-check" questions at the bottom of this slide.

What do I mean when I say single instruction stream performance?

And is superscalar execution an optimization that means a single instruction stream go faster, or does it accelerate a program containing many instruction streams?

Kaharjan

SISD performance is mainly related to improving clock frequency.
It accelerate a program containing many instruction streams.

kayvonf

@Kahararjan. Are you sure about your answer to number 2?

Take a look at this slide and its comments as well as the definition of superscalar execution.

Kaharjan

Oh, I am wrong, thank you. But I am still a little bit confusing.

Dose single instruction stream refer to one fetching/decoding unit? In Flynn's taxonomy, it looks like , single instruction stream refer to one fetching/decoding unit.

I feel that the architecture of superscalar processor in A Modern Multi-Core Processor slide is very similar MISD architecture in Flynn's taxonomy, is that right?

kayvonf

A single instruction stream refers to a single list of instructions. (Some people say a "single thread of control" or a "single thread of execution")

Given one instruction stream, if that instruction stream has ILP, then a processor with multiple fetch/decode units and multiple instruction execution units can run multiple instructions from that one instruction stream at the same time.

In Flynn's taxonomy:

SISD = Kayvon's simple one-instruction per clock processor.

MIMD = If we define MIMD as a processor core executing two different instructions from the same instruction stream on different pieces of data, then we are talking about superscalar execution. If we define MIMD as two completely different instruction streams (different threads) running at the same time on different pieces of data then we are talking about a modern multi-core processor.

It's okay to think of both of these designs as examples MIMD, although they are very different implementations. The first is fine-grained parallelization of individual instructions within a single thread of control (a single instruction stream). The second is coarse-grained parallelization across completely different threads (different instruction streams). (Flynn's taxonomy fails to provide sufficient clarity about these differences, so I don't feel it's so helpful.)

SIMD = exactly the SIMD execution I talked about in class. The lecture slides are very consistent with Flynn's taxonomy here. If you want to learn more some people think it is important to point out the difference between two implementations of SIMD execution, explicit SIMD, where the processor executes vector instructions generated by a compiler, or implicit SIMD (also called SIMT), where the compiler generates scalar (non-vector) instructions and the processor runs the instructions on many pieces of data in parallel. You can read more on slide 37 and slide 38 of this lecture. CPUs typically use explicit SIMD. GPUs use implicit SIMD.