Previous | Next --- Slide 64 of 65
Back to Lecture Thumbnails
kayvonf

Question: This is a good comment opportunity: write a definition for one of these terms!

Incanam

Coherent Execution: This means that the same instruction sequence is applied to all elements that are operated on simultaneously. So this is relevant when describing parallel execution using SIMD processing. It is necessary to get good parallelization out of the SIMD processing resources because those execute the same instruction sequence for all the separate data. Coherent executtion is not necessary to get parallelization out of multiple cores as each core can do different things while still doing them in parallel. Coherent Execution is contrasted by Divergent Execution (like the conditional example from the slides).

jedavis

Arithmetic intensity - The arithmetic intensity of a program, algorithm, or function is the ratio of arithmetic operations to memory operations which it performs. For example, the arithmetic intensity of a piece of code which loads two numbers from memory, adds them, and stores the result is 1/3; one addition per two loads and a store. All other things being equal, programs with high arithmetic intensities are generally more performant on parallel architectures, since they are less likely to be memory-bound and spend less time stalled on memory latency.

asheng

Hardware multi-threading - basically a term for the how hardware (can) facilitate the efficient execution of multi-threaded programs. The two types that we see in the above term list are:

Interleaved multithreading - in this case, the processor switches back and forth between executing instructions from several different threads. For example, cycle 1 is for an instruction from thread A, cycle 2 is for B, cycle 3 is for C, and cycle 4 is back to A again.

Simultaneous multi-threading - this really only applies to superscalar processors. Since a superscalar processor can handle multiple instructions in one cycle, we speed up the parallel program by having the processor simultaneously execute instructions from several different threads each clock cycle.

kayvonf

I like @incanam's definition of coherent execution, but I want to point out that coherent execution is a property of a program. Instruction stream coherence is present when the same instruction can be used for each piece of data (I say "can" rather than "is"). When coherence is present, it is possible to execute the computation efficiently using SIMD approaches. It probably would have been more clear had I asked you to define "instruction stream coherence" rather than "coherent execution".

tpassaro

Memory Latency - The amount of time for a memory from a processor to be serviced by the memory system. If a thread needs some information from main memory, it may take 10 cycles for that request to actually be fulfilled.

Memory Bandwidth - The rate at which the memory system can provide data to a processor. The higher the bandwidth, the more data you receive per time unit.

happyfeet

Bandwidth-bound application - This term is applied to describe the fact that memory bandwidth can be a bottleneck for applications running on modern parallel chips. This is because arithmetic operations are extremely fast, so a lot of the time a thread must wait for a memory operation. Now, when a lot of different threads are running at the same time, we expect to see many, many memory operations happening at the same time. However, the chip can only process so many memory operations at a time. Thus, once there are too many memory operations happening concurrently, we can expect to see massive delays since the chip cannot handle it anymore. This is a major challenge for hardware designers to overcome.

tpassaro

SMT (simultaneous multi-threading) is when the hardware is given more space for threads to run, meaning there are more registers for having multiple threads to run and more fetch/decode units to run multiple instructions in a single cycle. This is what Intel's Hyper-threading is. The processor still has the same number of physical cores, but each core has multiple execution units that can be used to run two separate threads per core.

Interleaved multi-threading has instructions from different threads being issued to the pipeline every clock cycle. Before, when talking about task switching to hide latency issues, the task switch itself is an interleaving of threads to be run in order to hide stalls. Operating systems will sometimes have simple round-robin execution of threads, where each clock cycle, a new thread is picked to run.

kayvonf

Simultaneous multi-threading involves executing instructions from two different threads at once on different execution resources. Interleaved multi-threading involves interleaving execution of instructions from two threads on a single set of execution resources.

What does Intel's Hyper-threading technology really do? Hyper-threading involves a mixture of ideas from simultaneous multi-threading (SMT) and interleaved multi-threading. Two hardware threads are managed by a core and share the same set of execution resources. This design is in the spirit of interleaved multi-threading. However, instead of interleaving execution of the threads each clock, an Intel CPU tries to find a mixture of operations from the two threads to fill all possible execution units. For example one thread might be performing a load and an integer add, and the processor might also determine it's possible to schedule a floating-point multiply from the second thread.

This design is a natural evolution of a superscalar processor. Prior to hyper threading, Intel had several execution units in the chip that could be used in parallel when ILP was present in an instruction stream. As we observed in lecture 1, ILP does not always exist in programs, so in many cycles not all CPU core execution units could be used. Hyper Threading is technology that allows the core to maintain state for a second thread, and to fill unused execution capacity with instructions from this second thread when it would be helpful to increasing overall performance.

This really old article about hyper-threading explains the idea pretty well. So does this article from AnandTech.

kayvonf

@tpassaro: Let's not confuse the interleaving of processes and threads onto a CPU by the OS with the fine-grained interleaving of hardware thread execution on a core. It takes thousands of clocks to perform a full OS-managed process context switch, making it untenable for this to be done at fine granularity.

apodolsk

Intel's own hyperthreading whitepaper is nice enough that it's probably the best bet for anyone curious about details. They talk about the role that HT was intended to fill, and also have their own exam-friendly summary of hardware multithreading concepts from 2002.

Hyper-Threading Technology Architecture and Microarchitecture