Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

kayvonf

Question: The key phrase on this slide is that a processor must execute instructions in a manner "appears" as if they were executed in program order. This is a key idea in this class.

What is program order?

And what does it mean for the results of a program's execution to appear as if instructions were executed in program order?

And finally... Why is the program order guarantee a useful one? (What if the results of execution were inconsistent with the results that would be obtained if the instructions were executed in program order?)

xiaozhuyfk

In my understanding, the processor must execute instructions in a manner "appears" as if they were executed in program order means that 'Superscalar execution' should output the exact same result (register content, memory, cpu state ...) when the instructions were executed sequentially.

holard

Re: Why is the program order guarantee a useful one? The program order is (usually) the order of execution intended by the programmer. Not executing in a fashion consistent with program order can lead to incorrect behavior/results. This type of error could be particularly difficult for a programmer to correct, as the behavior is not consistent with the program.

paracon

For superscalar execution you would need multiple execution data-paths in the processor design. So that you could execute independent instructions in parallel on a single processor. These independent instructions would run in parallel, rather than being interleaved (as would have been in a concurrent scenario). I hope my interpretation is accurate, correct me if it isn't.

200

"Appear" means that without revealing any internal state inside the CPU which is hidden from programmers, there is no way to determine whether the instructions are executed sequentially or not. We can think of this kind of abstraction as the separation of concerns principle in software design. The main benefit of this approach is that the hardware implementation can be optimized independently of software. It also helps to make the software easier to understand and design.

poptarts

In addition to everything mentioned above, I think the idea of the semantics of a program is a useful way to approach program order. Executing a piece of code in program order yields a result and creates side effects—the semantics of that program (i.e. the meaning of the code).

So, for superscalar execution, the processor must preserve the semantics of the program. As long as the meaning is intact, it does not matter what the exact ordering of execution is.

Additionally, in software such as a compiler (which can deduce the 'meaning' of a piece of code) techniques such as instruction pipelining and branch prediction can be exploited while preserving the program's semantics.

chenh1

The superscalar execution speedup is focused on the execution units such as arithmetic logic unit, while the pipeline execution speedup is focused on the entire processor(core). The rise of superscalar CPU results from the RISC microprocessors because RISC architectures frees transistors and die area which could be used to include multiple execution units.

However, the limitation of superscalar execution is the degree of intrinsic parallelism in the instruction stream. Besides, there is overhead of checking logic, especially in the case of branch.

life

I had a presumption that only processors can decide which instructions to execute in parallel. After looking it up in Wiki page, it turns out that is a misconception. ILP is actually a combination of design techniques of both processors and compilers. And also, the Wiki page has some brief introductions on instruction pipelining and branch prediction, which are also two techniques for exploiting ILP besides superscalar execution, as mentioned by poptarts. https://en.wikipedia.org/wiki/Instruction-level_parallelism

o_o

The program order guarantee is useful because it allows the programmer to know that the result of their code will be as if it was run in sequential order. This way, the programmer can have certainty that the output is always the same regardless of how many processors are used. If this guarantee didn't exist, the programmer would never be sure if an incorrect output was an error in their code or a superscalar execution error. I think it is also helpful that this guarantee exists because if the programmer stepped through their code themselves, the result they would get would be the same as when they run their code.

pk267

This slide seems to imply that ILP is done by the hardware(processor). But wiki (https://en.wikipedia.org/wiki/Instruction-level_parallelism) says it can be done by either: hardware (processor) or software (compiler). So which one is correct?

rav

ILP consists of an amalgamation of techniques both at the processor level (superscalar execution, out-of-order execution, branch prediction) as well as compiler level (pipelining). Superscalar execution requires multiple execution resources (and corresponding datapaths). The wikipedia page also outlines the limitations of superscalar processing, that is the degree of intrinsic parallelism and the cost of checking dependencies on the fly. I imagine the cost of checking dependencies would scale poorly with the number of executions the processor looks at when deciding which ones can be executed in parallel. https://en.wikipedia.org/wiki/Superscalar_processor

coffee7

@pk267 in addition to what @rav has said, it is important to note that superscalar execution exploits ILP. However, like the Wikipedia article you linked says, there are other ways to approach ILP as well.