Previous | Next --- Slide 35 of 65
Back to Lecture Thumbnails
dre

I missed the explanation for how much speedup we get for this. Can someone explain again?

Xiao

32x (4 cores x 8 SIMD) should be a theoretical limit for this example. Professor K's laptop achieved 44x performance because it had hyperthreading, which doubles the threads ran. Of course, communication and other overheads prevents it from hitting the theoretical limit of 64x when using hyperthreading.

kayvonf

@Xioa: The real-life hyper-threading example has a complex answer, but you should reconsider your claim that about the 64X speedup. As you correctly state, there is only 32 times more processing capability on this slide than on my original simple chip. So why would you expect to see a 64X speedup?

I also recommend that folks with modern laptops measure the speedup of Assignment 1 programs like Mandelbrot_ispc_with_tasks and sqrt on their own machines to see if you can replicate my 44X number. Note that you'll have to change your ISPC build settings in the Makefile (this is discussed in the assignment.)

Xiao

I think 64x would be achievable (in the sense of coming near to it) if the program is structured such that the memory fetching of each thread is perfectly in synch with the execution of the other thread. I'm thinking of something along the line of slide 47, but with 2 threads and without the gap between end of stall and resuming execution. In such a case, each core would finish running two threads when sequentially it could had only finished 1, hence hyperthreading alone would grant a 2x improvement. Adding this to every SIMD core, would lead to a theoretical upper bound of 64x (4 cores x 8 SIMD x 2 SMT) compared to the sequencial program. Of course this is essentially impossible to achieve in real life applications :P

choutiy

Can someone explain hyperthreading or make links to later slides just for reference?

kayvonf

@choutiy: There's a solid discussion about hyper-threading on slide 64. Hyper-threading is a particular implementation (by Intel) that combines ideas from simultaneous multi-threading and interleaved multi-threading.