Previous | Next --- Slide 30 of 38
Back to Lecture Thumbnails
ajindia

Would an apt example of machine restriction be the following? CUDA with compute capability 2.0 allows only 1024 threads per block. Hence the number of subproblems within a block gets restricted to either this number. If not, we would have to handle the subproblems in chunks of 1024 which may cause an overhead problem, like in assignment 2.

kailuo

Just as what the slide says, there should be many more tasks than processors (in this case more than 1024 for each block) so that dynamic assignment has a large enough work pool to choose from. But the number of tasks should not be infinitely large because once the workload of each task gets small, overhead of managing the tasks will dominate the useful work, and doing MUCH MORE work to guarantee a better workload balance is surely a bad thing to do.