For assigning threads to execution contexts, on one hand it would be better to assign the threads to separate cores first and then if still more threads remain assign them to same core but different context. According to this both the threads will execute together and produce fast results.
There can be an argument that since the two threads might be using the same data which if cached can be reused by the other thread and thus produce faster results.
But I feel as a general case the first one would be better.
I'm a little confused by the difference between hardware and software threads. So far, I have this much:
1) Hardware threads are on the chip and allow for instructions to be executed on the chip at the same time.
2) Software threads are the abstraction that we use to implement concurrent software (e.g. pthreads we used in 213's proxy).
Maybe? Not sure.
scheduling 2 threads on same core advantage:
if 2 threads share a lot of data, putting them on same core will increase utilization of L1 caching capability.
Question though: usually what is the order of magnitude of the advantage above?
The thread switch overhead for hardware threads is much lower than the software threads because the execution context for the hardware threads are already in the registers of the cores. And the hardware thread switch happens more frequently (per tick level) than software threads switch (OS takes care of that)
If I have two threads, I would like to assign threads to different cores because there'll be two ALU units can be working which can generate approximately 2x speedup. If I have 5 threads, I will assign 4 threads to 2 cores first. After one of the thread finishes, I will assign the last one to the idle execution context. The problem is that the workload assigned to two cores will be unbalanced. To solve this problem, it's better to partition the work in a fine-grained way which can help load balancing.