Previous | Next --- Slide 39 of 56
Back to Lecture Thumbnails
abunch

So maybe I am seeing numbers like 8 and 32 and jumping to conclusions, but I thought the whole point is that you could do this without knowing any of the underlying hardware and you would just say "compile this to GPU!" ? These numbers make me think that you might be assuming some things such as an 8-wide SIMD, or so many cores in the GPU, or having AVX instructions, etc when you decide how big to make the tiles, chunks, and vectors?

nslobody

To add on to @abunch's question, would it be possible for the compiler to detect the machine's SIMD width, number of cores, and other processor characteristics and use those automatically in determining how to schedule the algorithm? If so then they could be easily overriden with annotations if another schedule besides that which the compiler chooses were desired.

jrk

Yes, the ideal parameters are going to be platform specific. The key idea is that the schedule is fully decoupled from the description of the "algorithm." This is just one of an infinite number of possible schedules for this simple pipeline, and which is best will depend on the target architecture, the parameters and inputs (e.g., are the images small enough to fit in cache?), and the composition of the pipeline. The schedule can be specified explicitly by a human (as it is here), inferred by the compiler (e.g., using stochastic search aka "autotuning," as in the follow-on paper: http://people.csail.mit.edu/jrk/halide-pldi13.pdf), or any combination of the two (as @nslobody suggests).