Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

Jing

Kayvon said in the class that these days CPU is smart enough to perform ILP "dynamically" instead of statically before real computation happens. This is contrary to what I imagined at the first place because I thought dynamically figuring out which part can be parallelized itself would waste cycles, and if we do this on the fly instead of compile time, it seems not worthy all the time (in the case where there is no ILP exist, but we waste cycles detecting it). Is this true?

vrazdan

The section about ILP and how Professor Kayvon stated that hardware engineers had been working on parallelism 10-15 years ago reminded me of this article I read http://www.lighterra.com/papers/modernmicroprocessors/. Originally written in 2002, it details the difference between superscalar (executing instructions in parallel) and superpipelining (I think breaking up the length of instructions into smaller portions so the largest gate delay is minimized. It also talks about how some processors rearrange instructions that are unrelated for out of order processing, just as Prof. Kayvon did in class. I recommend the article as a nice overview of what we discussed in class today as the author also relates the concepts to actual processors.

byeongcp

I'm not familiar with optimizations performed by compilers or ILP, but i'm still curious about this. Would it be possible that the optimizations performed by a compiler gets rid of independent instructions that processors can exploit, which could end up slowing things down (i.e. it would've been faster if we had less optimization and end up with many independent instructions that processors can exploit)?

ragnarok451

@jing Yes, ILP happens dynamically (though apparently it can be done at compile-time as well). I'm not sure, but I think that dynamic ILP is better since it allows the processor to optimize taking into account processor instruction cache misses. This powerpoint from Pitt has more info about the actual hardware algorithms used for ILP:

http://people.cs.pitt.edu/~cho/cs2410/current/lect-ilp_4up.pdf

kayvonf

http://en.wikipedia.org/wiki/Superscalar

http://en.wikipedia.org/wiki/Instruction-level_parallelism

jazzbass

I think that what is confusing on this slide is the difference between ILP and superscalar execution. Most processors support ILP via pipelining. The idea is to split every instruction into several independent "phases", therefore many instructions can be making progress at the same time (each of them on a different phase of the pipeline). More information here: http://en.wikipedia.org/wiki/Classic_RISC_pipeline

Superscalar execution is not supported by all processors. A superscalar processor can issue multiple instructions in a single clock, on a single core. Look at the top right image on Wikipedia: http://en.wikipedia.org/wiki/Superscalar. On the image, there are two instructions executed at the same time in a five stage pipeline. Superscalar execution must know which instructions are independent from others to support parallel execution.

lament

Out of curiosity, have any of the course staff contributed to the noted wikipedia pages? Do any of the course staff know anyone who has contributed to the wikipedia pages here mentioned?

oulgen

I'm curious about how parallel execution deals with side effects. In this case, there are two side effects:

Exceptions
Flags (Overflow, carry etc.)

Lets assume that there are two execution units and they are both doing arithmetic operations. And assume that they raised different exceptions in their respective units. However, when these exceptions need to be handled, how does the processor decide what to do?

arjunh

@oulgen That's a great question; I'm pretty much a novice in computer architecture, but from what I understand, instruction parallelism techniques check for dependencies before deciding which instructions should be executed together.

There are algorithms such as:

score-boarding: each instruction logs all its dependencies and is executed only when the scoreboard determines that there are no conflicts with previously issued/incomplete instructions:
Tomasulo algorithm: uses a technique called register renaming to resolve dependencies; I may be wrong since I haven't taken comp. arch or compilers, but this technique feels closely linked to single static assignment).

The details are pretty advanced and you certainly don't need to know them for this class, but in short, you can treat the flags and exception status like you would any other dependency.

Just to make it clear, understanding data-dependencies is a crucial part to parallel programming and you will definitely have many opportunities to apply your skills to identify such dependencies and determine how to modify/eliminate them; however, you will be working on a much higher level than the actual instruction-pipeline; ILP is generally not going to be something you'll have much control over, since it's a super-scalar processor optimization.

sam

Form what i know about computer architecture is in case of an exception, all the instructions which are there in the pipeline at the time of exception will be flushed and the machine will restart execution from the instruction which caused exception.

admintio42

How would we know if the compiler parallelized some part of the program for us?