Interesting to know how multiple smaller contexts can offer more options for the core during the stalls to keep it busy. I wonder how the number of hardware threads (i.e. the # of contexts, which is 16 in this case) was determined initially.
Is the storage for the execution contexts physically separated or pre-designated? Or is it an arbitrary partitioning of some memory decided by the chip?
I wonder since if it's the latter, would that mean it's possible that if only one thread is running, could you have a larger working set?