Slide View : 15-418 Spring 2013

Previous | Next --- Slide 30 of 38

ajindia

Would an apt example of machine restriction be the following? CUDA with compute capability 2.0 allows only 1024 threads per block. Hence the number of subproblems within a block gets restricted to either this number. If not, we would have to handle the subproblems in chunks of 1024 which may cause an overhead problem, like in assignment 2.

This comment was marked helpful 0 times.

kailuo

Just as what the slide says, there should be many more tasks than processors (in this case more than 1024 for each block) so that dynamic assignment has a large enough work pool to choose from. But the number of tasks should not be infinitely large because once the workload of each task gets small, overhead of managing the tasks will dominate the useful work, and doing MUCH MORE work to guarantee a better workload balance is surely a bad thing to do.

This comment was marked helpful 0 times.