Multi-threaded and Multi-core execution are separated here (which makes sense), however the methods used to achieve multi-core and multi-threaded programs are often the same, right? My understanding is that the way threads are assigned to cores is up to the scheduler and the only difference is that, in order to see benefits from multi-threaded execution per processor, there needs to be more than one thread per core.
From the programmer's perspective it seems like the methods are the same, but with multi-core there is the additional concern of cache coherence, communication overhead and work imbalance.
@arbitorOfTheFountain. When software creates threads, it is declaring that there is the potential for thread-level parallelism in the application. So yes, the same software mechanism for revealing independent execution may result in software threads being run within execution contexts on the same core (via hardware multi-threading) or on execution contexts on different cores (taking advantage of multi-core designs.)
However, hardware multi-threading and multi-core parallelism are two very different hardware mechanisms for exploiting potential parallelism in the application. And since this slide is talking about the execution mechanisms, and not how independent threads are revealed by software, I don't want anyone walking away with the conclusion that they are essentially the same.
For multi-threaded execution, it says that programmers need to create more threads than cores. Would a more precise phrasing be that a programmer must create at least as many threads as total execution contexts among all the cores to saturate all cores? For example, if you had a quad-core processor with hyper-threading, that would mean you need to make at least 8 threads to saturate the execution contexts created by multi-core and multi-threading.
No, I think more threads than cores is accurate, because multithreading means MORE than one thread on at least one core. In practice it's often beneficial to have MANY more threads than cores especially if a dynamic work assignment scheme is used.
I do agree that breaking up your work into many more chunks of work than there are cores is good, especially if you expect different chunks to take different times, as dynamic scheduling would help balance that. I'm saying that these chunks of works should then be assigned to exactly as many threads as there are execution contexts. If your cores are multi-threaded than there are more execution contexts than cores, and thus more threads than cores.
In the quad-core hyper-threaded example, 8 threads would be more threads than cores (8 > 4), but would be exactly as many threads as each core could handle simultaneously. There wouldn't be any extra overhead of loading threads onto cores because there would be the max 2 threads/core, and dynamic work assignment could then assign work to each thread as it finished with its work. It's possible that if you don't know how many threads can go on a core, having many more threads than cores would help make sure you saturate the cores. However, after that saturation point I think more threads would just incur more overhead than they would produce increased parallelism.
Oh, sorry I misread what you said before. If work can be divided evenly it should be best to launch the same number of threads as execution contexts.