Previous | Next --- Slide 29 of 79
Back to Lecture Thumbnails
Iamme

Is it actually faster to prioritize SIMD over using multiple cores (for a loop without conditionals, for example, like in the next slide) or is it simply more space-efficient and logical? In other words, if nothing else were competing for resources and there were say 8 iterations of a loop, would it take the same time to run each iteration on a different processor as on a different ALU of the same processor?

bob_sacamano

The only difference between the 2 cases is the contention between cores while accessing the instruction buffer? If there is contention then I'd presume that the SIMD solution would perform better?

tcm

This is a good line of discussion. I would be interested in hearing more people express their thoughts and hypotheses about this.

ehauser

@bob_sacamano Agreed, I would bet that SIMD is a lot faster than separate processors because of caching and the cost associated with propagating the changes to all of the processors. Assuming that the data is contiguous, the SIMD approach would cause a cache miss once and only need to propagate the line to L3 once, while the multiple processors would invalidate each others cache.

A real world example could be the birthday addition we did in class. The SIMD scenario would have a single person add one to a list of birthdays then spread them to the other seven people. The multiple processors would have eight people that each know each others birthday and add add one to their own birthday then share the result with the others. I would imagine this second scenario would have a lot more communication.

tclarke

There are many variables that could affect the speed on a multi-core processor. But I'm not sure if memory caching would be the big slowdown, since L3 is shared between all the processors. Loading from L3 would have to happen sequentially, as far as I know, but storing to L3 from main memory could only happen once, depending on how the cache is set up and how far ahead some cores may be than others in the instruction stream.