I try to make a concise summarization here for reviewing purposes (correct me if I'm wrong):
Multi-core: Multiple processing units on one processor
SIMD execution: "Single instruction multiple data". Passed by just one fetch/decode unit, processed by multiple ALUs, which perform the same instruction on different data.
Coherent control flow: Same instruction performed on each element and needed for SIMD execution
Coherent control flow
Hardware multi-threading - Interleaved multi-threading: the hardware (processor) figures out which thread should run at each clock
Hardware multi-threading - Interleaved multi-threading
Hardware multi-threading - Simultaneous multi-threading: Each clock, core chooses instructions from multiple threads to run on ALUs
Hardware multi-threading - Simultaneous multi-threading
Memory latency: Time it takes for processor to request data and receive it
Memory bandwidth: A rate of how much data can go to the proessor and back during a certain amount of time
Arithmetic intensity: Ratio of math to data access operations
I think the definition of Interleaved multi-threading v.s. Simultaneous multi-threading is a little bit confusing. On wikipedia, it seems to distinguish these by whether serveral instructions are issued at the same time, pretty like the difference between concurrent and parallel.
Cycle i + 1: an instruction from thread B is issued.
Cycle i + 2: an instruction from thread C is issued.
Cycle i: instructions j and j + 1 from thread A and instruction k from thread B are simultaneously issued.
Cycle i + 1: instruction j + 2 from thread A, instruction k + 1 from thread B, and instruction m from thread C are all simultaneously issued.
Cycle i + 2: instruction j + 3 from thread A and instructions m + 1 and m + 2 from thread C are all simultaneously issued.
@firebb. I posed a clarification on interleaved vs. simultaneous multi-threading on slide 58.
You also may wish to take a look at the review figures at the end of this lecture.
Question: Why is arithmetic intensity such an important term to know (and think about when optimizing programs)?
We can improve the efficiency of bandwidth bound applications by increasing the arithmetic intensity.
Arithmetic intensity is defined as the ratio of math operations to data access operations. We want programs to be faster, which could only be achieved by executing arithmetic operations faster. Meanwhile, we also need to fetch data as operands. However, data accesses are totally useless for the perspective of speedups, since there are no actual mathematic operations done. Moreover, due to the high latency of loading data from memory/disk, and the limited bandwidth of memory system, data accesses could become the bottleneck for data-intensive programs. Therefore, by estimating the arithmetic intensity, we could figure out the bottleneck of the program more easily, so that we can optimize it.