Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

potay

With so many layers of parallelism, how do we figure out which is the most efficient way to split the execution among the levels? Can this always be done at compile time? Is it ever pushed to runtime?

jrgallag

How does Intel's Hyper-Threading compare to these methods? I did some cursory research, but couldn't find very detailed information. I believe hyperthreading allows data for multiple threads to be retrieved simultaneously on a single core, but I'm interested in something more specific.

xx420y0los4wGxx

Talking about SIMDs, consider a vector A: <1,1> and B:<2,2> and we want to add them together.

From what I understand the speedup would be due to the difference between

Sequential: 1. C[0]= A[0]+B[0] 2. C[1] = A[1] + B[1]

Parallel: 1. C<> = A<> + B<>

Where the SIMD basically does both adds for the vector on 2 ALUs in parallel. But if we consider how the vector is stored in memory, let's say A : 0101, B : 1010 where our vectors can only handle 2 bit unsigned ints. We can trivially calculate C with 1 addition with 1 ALU. Is this also considered SIMD? (also is my original interpretation even correct?)

memebryant

@jrgallag, I believe Hyperthreading is the Intel implementation of what is discussed starting at slide 48.

vincom2

@potay it's often done at runtime; for instance, the programmer could read the number of available cores from the OS in order to decide how many worker threads to spawn.

haboric

I'm curious about which compute units are used to address superscalar execution? I.e., which compute units in a core process different independent instructions from the same instruction stream?

Lotusword

Find some differences between Superscalar and Pipeline: http://stackoverflow.com/questions/1656608/what-is-difference-between-superscaling-and-pipelining. I don't know if it is correct, but in my understanding, pipelining uses only one subunit per execution while superscalar CPUs occupy multiple execution subunits able to do the same thing in parallel. Please point out if it is wrong:).

ote

Does the definition of MIMD outlined here include Multi-core and using multiple processors, or is MIMD an example of Multi-core parallel execution?