Previous | Next --- Slide 74 of 79
Back to Lecture Thumbnails

I wonder how multi-threads are performed here. Given hardware resource like above, if there are four threads, we can assign two threads to each core. Of course we can use multi-thread techniques to have the two threads run in an interleaved way. I'm just curious: is it possible that each of the two threads is running on one ALU(i.e. the yellow part), though the two ALUs in a core should normally execute two instructions from the same instruction stream?


In the case of Intel Hyperthreading, one of the Exec is a SIMD execution unit and the other is not, so in that case, if there were two threads both trying to execute SIMD instructions, they would of course be running on only one of the execution units. However, in this particular case, I don't see any reason the core would choose to run both threads on one execution unit. Perhaps if the hardware was designed in a way that they can run on one execution unit rather than allowing them to run mostly independently of one another, then it would.

As a side note, I think this is what AMD's CMT (Cluster Multi-Threading) is about. Within each core is a single execution context, but multiple execution units. However, this has some major draw-backs, because memory bandwidth is finite and often is the constraining factor (as we saw in saxpy). As a result, those fancy 8-core AMD processors (Bulldozer, Piledriver) very often perform worse than 4-core Intel processors.