Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

yimmyz

The key difference between interleaved multi-threading and simultaneous multi-threading is that every clock, core chooses instructions from one / multiple threads to run. (Slide 57)

blairwaldorf

Multicore processor: Multiple processing units on one processor

SIMD Execution: "Single instruction multiple data". One instruction decoded, and sent to all ALUs, which perform that instruction on different data.

Coherent control flow: Same instruction performed on each element and needed for SIMD execution

Hardware multithreading: the hardware (processor) figures out which threads should run when.

Interleaved multithreading: one thread is chosen to run at any given clock

Simultaneous multithreading: instructions that can run at that time are run on ALUs at each clock.

Memory latency: time it takes for processor to request data and receive it.

Memory bandwidth: a rate of how much data can go to the processor and back during a certain amount of time.

Bandwidth bound application: bottleneck due to requesting and receiving things from memory.

Arithmetic intensity: ratio of math to data access operations

IntergalacticPeanutMaker

"explicit SIMD" --> parallelization preformed @ compile time (CPUs)

"Implicit SIMD" --> compiler creates scalar instructions... hardware does parallelization (GPUs)

MaxFlowMinCut

coherent control flow is the same thing as coherent execution, correct?

kayvonf

@MaxFlowMinCut: yes, you are correct. Coherent control flow on this side is the same concept as coherent execution on slide 36.

lol

@kayvonf So when we talk about hardware multithreading, with multiple contexts, this entails both IMT and SMT: at each clock, (lets say there are 2 F/D units) the processor can choose 2 instructions from the pool of instructions available from the 2 or more hardware threads. If there are say 4 hw threads, then we can have a following execution sequence:

i: 1 & 1

i+1: 1 & 3

i+2: 2 & 4

i+3: 2 & 2

There isn't a notion of context switching at this level right? The processor can access all contexts at the same time?

So then is SMT a capability of the processor? It either supports it or doesn't.