There is 64 total concurrent instruction streams, I understand that. But I am not able to understand 512 independent pieces of work? How do we get that?
kayvonf
There are 16 cores x 4 independent instruction streams ("hardware threads") per core = 64.
Each thread executes 8-wide SIMD instructions that manipulate 8 elements of data.
So I need 64 x 8 = 512 pieces of data to process if I want to use all 64 threads, and have all the threads make use of 8-wide SIMD instructions.
This picture put all three of the major throughput processor design ideas from the lecture in the same place:
There is 64 total concurrent instruction streams, I understand that. But I am not able to understand 512 independent pieces of work? How do we get that?
There are 16 cores x 4 independent instruction streams ("hardware threads") per core = 64.
Each thread executes 8-wide SIMD instructions that manipulate 8 elements of data.
So I need 64 x 8 = 512 pieces of data to process if I want to use all 64 threads, and have all the threads make use of 8-wide SIMD instructions.