Previous | Next --- Slide 59 of 86
Back to Lecture Thumbnails
kayvonf

This picture put all three of the major throughput processor design ideas from the lecture in the same place:

Kaharjan

There is 64 total concurrent instruction streams, I understand that. But I am not able to understand 512 independent pieces of work? How do we get that?

kayvonf

There are 16 cores x 4 independent instruction streams ("hardware threads") per core = 64.

Each thread executes 8-wide SIMD instructions that manipulate 8 elements of data.

So I need 64 x 8 = 512 pieces of data to process if I want to use all 64 threads, and have all the threads make use of 8-wide SIMD instructions.