I forgot what does "Diminishing returns when trying to further exploit ILP" mean. Does it indicates that ILP diminishes benefits of parallelism when a single core with higher and higher clock frequency? If so I cannot remember why this is happening?
This comment was marked helpful 0 times.
smklein
It really just means that there is only so much parallelism you can get from pipelining chunks of instructions on a single core. Splitting only single independent instructions provides an extremely small playing field for parallelism, especially in comparison to multi-core potential. With multiple cores, your CPU doesn't need to be smart enough to strip apart independent instructions and data dependencies from a single instruction stream.
This comment was marked helpful 1 times.
jhhardin
As @smklein mentions, there is only so much ILP you can find. I think the diminishing returns refers to the fact that trying to find more instructions that could benefit from ILP actually takes more time than what is saved performing the instructions in parallel.
This leads me to another question I've been wondering: it seems like we could benefit from having compilers do more in terms of finding ILP, somehow marking those instructions so the hardware can easily find them without having to do the extra work. Is there any reason this isn't the case?
This comment was marked helpful 0 times.
benchoi
Is ILP implemented by utilizing the multiple ALUs on a core? Would we lose ILP if we try to use SIMD?
This comment was marked helpful 0 times.
sbly
@benchoi Well, it uses multiple functional units at the same time, which may or may not be ALUs. They could also be FPUs, or any other part of the CPU. As for whether or not that conflicts with SIMD or not, I haven't thought about that. But if you're getting the full parallelism/ computing power out of your processor, it shouldn't matter what you use to get it.
This comment was marked helpful 0 times.
tianyih
@benchoi you might find this lecture 2, slide 11 helpful. There are more decoders, and many other hardware units that make ILP happen.
I forgot what does "Diminishing returns when trying to further exploit ILP" mean. Does it indicates that ILP diminishes benefits of parallelism when a single core with higher and higher clock frequency? If so I cannot remember why this is happening?
This comment was marked helpful 0 times.
It really just means that there is only so much parallelism you can get from pipelining chunks of instructions on a single core. Splitting only single independent instructions provides an extremely small playing field for parallelism, especially in comparison to multi-core potential. With multiple cores, your CPU doesn't need to be smart enough to strip apart independent instructions and data dependencies from a single instruction stream.
This comment was marked helpful 1 times.
As @smklein mentions, there is only so much ILP you can find. I think the diminishing returns refers to the fact that trying to find more instructions that could benefit from ILP actually takes more time than what is saved performing the instructions in parallel.
This leads me to another question I've been wondering: it seems like we could benefit from having compilers do more in terms of finding ILP, somehow marking those instructions so the hardware can easily find them without having to do the extra work. Is there any reason this isn't the case?
This comment was marked helpful 0 times.
Is ILP implemented by utilizing the multiple ALUs on a core? Would we lose ILP if we try to use SIMD?
This comment was marked helpful 0 times.
@benchoi Well, it uses multiple functional units at the same time, which may or may not be ALUs. They could also be FPUs, or any other part of the CPU. As for whether or not that conflicts with SIMD or not, I haven't thought about that. But if you're getting the full parallelism/ computing power out of your processor, it shouldn't matter what you use to get it.
This comment was marked helpful 0 times.
@benchoi you might find this lecture 2, slide 11 helpful. There are more decoders, and many other hardware units that make ILP happen.
This comment was marked helpful 0 times.