I try to make a concise summarization here for reviewing purposes (correct me if I'm wrong):

Multi-core: Multiple processing units on one processor

SIMD execution: "Single instruction multiple data". Passed by just one fetch/decode unit, processed by multiple ALUs, which perform the same instruction on different data.

Coherent control flow: Same instruction performed on each element and needed for SIMD execution

Hardware multi-threading - Interleaved multi-threading: the hardware (processor) figures out which thread should run at each clock

Hardware multi-threading - Simultaneous multi-threading: Each clock, core chooses instructions from multiple threads to run on ALUs

Memory latency: Time it takes for processor to request data and receive it

Memory bandwidth: A rate of how much data can go to the proessor and back during a certain amount of time

Arithmetic intensity: Ratio of math to data access operations

firebb

I think the definition of Interleaved multi-threading v.s. Simultaneous multi-threading is a little bit confusing. On wikipedia, it seems to distinguish these by whether serveral instructions are issued at the same time, pretty like the difference between concurrent and parallel.

Interleaved multi-threading

Cycle i + 1: an instruction from thread B is issued.

Cycle i + 2: an instruction from thread C is issued.

Simultaneous multi-threading

Cycle i: instructions j and j + 1 from thread A and instruction k from thread B are simultaneously issued.

Cycle i + 1: instruction j + 2 from thread A, instruction k + 1 from thread B, and instruction m from thread C are all simultaneously issued.

Cycle i + 2: instruction j + 3 from thread A and instructions m + 1 and m + 2 from thread C are all simultaneously issued.

kayvonf

@firebb. I posed a clarification on interleaved vs. simultaneous multi-threading on slide 58.

You also may wish to take a look at the review figures at the end of this lecture.

kayvonf

Question: Why is arithmetic intensity such an important term to know (and think about when optimizing programs)?

boba

We can improve the efficiency of bandwidth bound applications by increasing the arithmetic intensity.

lya

Arithmetic intensity is defined as the ratio of math operations to data access operations. We want programs to be faster, which could only be achieved by executing arithmetic operations faster. Meanwhile, we also need to fetch data as operands. However, data accesses are totally useless for the perspective of speedups, since there are no actual mathematic operations done. Moreover, due to the high latency of loading data from memory/disk, and the limited bandwidth of memory system, data accesses could become the bottleneck for data-intensive programs. Therefore, by estimating the arithmetic intensity, we could figure out the bottleneck of the program more easily, so that we can optimize it.

I try to make a concise summarization here for reviewing purposes (correct me if I'm wrong):

`Multi-core`

: Multiple processing units on one processor`SIMD execution`

: "Single instruction multiple data". Passed by just one fetch/decode unit, processed by multiple ALUs, which perform the same instruction on different data.`Coherent control flow`

: Same instruction performed on each element and needed for SIMD execution`Hardware multi-threading - Interleaved multi-threading`

: the hardware (processor) figures out which thread should run at each clock`Hardware multi-threading - Simultaneous multi-threading`

: Each clock, core chooses instructions from multiple threads to run on ALUs`Memory latency`

: Time it takes for processor to request data and receive it`Memory bandwidth`

: A rate of how much data can go to the proessor and back during a certain amount of time`Arithmetic intensity`

: Ratio of math to data access operationsI think the definition of Interleaved multi-threading v.s. Simultaneous multi-threading is a little bit confusing. On wikipedia, it seems to distinguish these by whether serveral instructions are issued at the same time, pretty like the difference between concurrent and parallel.

Interleaved multi-threading

Cycle i + 1: an instruction from thread B is issued.

Cycle i + 2: an instruction from thread C is issued.

Simultaneous multi-threading

Cycle i: instructions j and j + 1 from thread A and instruction k from thread B are simultaneously issued.

Cycle i + 1: instruction j + 2 from thread A, instruction k + 1 from thread B, and instruction m from thread C are all simultaneously issued.

Cycle i + 2: instruction j + 3 from thread A and instructions m + 1 and m + 2 from thread C are all simultaneously issued.

@firebb. I posed a clarification on interleaved vs. simultaneous multi-threading on slide 58.

You also may wish to take a look at the review figures at the end of this lecture.

Question:Why is arithmetic intensity such an important term to know (and think about when optimizing programs)?We can improve the efficiency of bandwidth bound applications by increasing the arithmetic intensity.

Arithmetic intensity is defined as the ratio of math operations to data access operations. We want programs to be faster, which could only be achieved by executing arithmetic operations faster. Meanwhile, we also need to fetch data as operands. However, data accesses are totally useless for the perspective of speedups, since there are no actual mathematic operations done. Moreover, due to the high latency of loading data from memory/disk, and the limited bandwidth of memory system, data accesses could become the bottleneck for data-intensive programs. Therefore, by estimating the arithmetic intensity, we could figure out the bottleneck of the program more easily, so that we can optimize it.