Slide View : Parallel Computer Architecture and Programming : Tsinghua Summer 2017

Previous | Next --- Slide 77 of 86

kayvonf

There is an interesting detail on this slide.

Here I drew a processor with two different types of execution units. One of the execution units is a regular unit that performs regular operations on scalar ("single") data values. The other is a SIMD execution unit that performs 8-wide vector operations on 8-element vectors of data.

Note that these cores are a superscalar cores. But since we have only one vector execution unit and one regular scalar execution unit, in order for these cores to execute two instructions at once, one of the instructions must be a scalar instruction and the other must be a vector one. The cores cannot take advantage of superscalar execution capabilities if an instruction stream has two independent vector instructions or two independent scalar ones. This is an addition constraint on the ability of the core to take advantage of instruction level parallelism.

xielei

How can we distinguish between superscalar and simultaneous multi-threading according to how the boxes are drawn? More than one context box means that multi-threading is supported, but many fetch/decode boxes have two meanings in this class, one is superscalar as is illustrated in this slide, the other is lecture 2, slide 60, where many fetch/decode boxes means simultaneous multi-threading. (However, the bright one with the dark one together means superscalar).

The question above is less important, but :

Superscalar needs at least two fetch/decode units, so does (simultaneous) multi-threading. (and multi-threading needs at least two contexts, but superscalar does not.) So if I have a core with many contexts and many fetch/decode, I can't know whether it is superscalar or simultaneous multi-threading?

kayvonf

@xielei. This is a good question. The answer is that your understanding of the pictures is correct, and that my pictures are not detailed enough to distinguish between the cases you describe. So you'd need more information about the dispatch rules supported by the core to really understand how it behaves. It is true that a picture with two execution contexts per core (blue boxes) and two decode units (orange boxes) might illustrate:

Run one thread per clock (interleaved multi-threading) with support for superscalar execution within the thread.
Run two threads per clock via simultaneous multi-threading, (SMT) but not support SIMD within the thread (max one instruction per thread)
Be able to run any mixture of instructions from either one or two threads (This is Intel Hyperthreading -- it can run two instructions from one thread (superscalar with interleaved multi-threading) or one instruction from each of two threads (simultaneous multi-threading and no superscalar)

But it would be very complicated to try and indicate all this information in a figure.