Previous | Next --- Slide 18 of 56
Back to Lecture Thumbnails
asinha

I'm not sure what the last bullet point means about placing unrelated threads on the same processor to use the machine efficiently? When it says compute limited and bandwidth limited, does it mean to place these threads together such that they divide up the computational and bandwidth resources in the most efficient manner so as to maximize the use of these resources? If that is the case, would that not happen when the threads were on the same processor and had maximal locality/minimized communication cost? Can someone come up with a scenario where this is not the case?

retterermoore

I think it's suggesting that the benefit from fully utilizing the resources of a processor might outweigh the benefits of locality.

For instance, if you had 4 "threads" and two processors to run them on, and 2 of those threads ran a program that was full of memory accesses, and 2 of those threads ran a program that had a lot of computation, it would make sense to assign 1 of each of those threads to a given processor; if you put both threads of the same type on one processor, that process could end up bandwidth-limited or computationally limited. Even if the threads of the same type have some locality, the benefit from being able to split the latency of the memory accesses could be worth it.