There is a 4 way hyper threading in Xeon Phi, since the NUMA could cause more memory stall.
@cyl: Not really. It actually primarily helps with requests to the L1. The phi lacks a lot of the prediction logic the standard Xeons have for detecting prefetching or reordering instruction execution. It has hardware L2 prefetching, but not L1 prefetching. It's also not hyperthreaded in the typical way: it's a 4 wide barrel processor where the current thread switches every cycle regardless of whether a stall happens or not. The intent, then, is that the software (typically inserted by the compiler) will include the necessary prefetch instructions so 2 threads can saturate the pipelines while 2 other threads are stalled. After a thread issues an instruction, it'll take around 4 cycles before it can issue another. This is long enough for a request going out to L1, but definitely not enough for a request to L2, which is likely why intel still put in the hardware to do L2 prefetching.