Previous | Next --- Slide 45 of 58
Back to Lecture Thumbnails
moon

I'm confused at why Halide is 2x faster than CUDA. My understanding is that both Halide (with the right scheduling) and CUDA both use the same processors and threads as efficiently as possible. Where does the 2x speedup come from then?

tomshen

One reason could be that it's theoretically possible to write CUDA code by hand that is as fast as Halide, but that it's impractical (e.g. would take thousands of lines of illegible code). Halide is specifically designed to generate machine code that is as efficient as possible for a very constrained domain, whereas CUDA is much more general-purpose.

Another reason could be because Halide uses JIT compiliation. This means that a program in Halide can generate new machine code while it's running, which isn't really possible just using CUDA. JIT compilation is what makes the JVM interpreter as fast as (and sometimes faster) than machine code compiled from C++.

bstan

I'm surprised by the camera RAW processing pipeline results. Is the hand-tuned ARM assembly code that has generated from C++ and then hand-tuned? And what does 2.75x less code mean? 2.75x fewer lines of Halide code compared to the lines of assembly? It seems a bit magical that the result is 5% faster than a hand-tuned assembly implementation.

pinkertonpg

Another reason for the 2x speedup over hand-written CUDA is that Halide, from what I've read on it, doesn't just use the GPU. If all of the low level code that the Halide compiler generated was in CUDA, I suspect the implementations would be similar in speed. However, Halide leverages more resources out of the computer by vectorizing code, multithreading, and using the GPU. Here's a quote from the Halide paper the scheduling pictures from a few slides back: "The implementation is multithreaded and vectorized according to the schedule, internally manages the alloca- tion of all intermediate storage, and optionally includes synthesized GPU kernels which it also manages automatically".

drayson

Also, in general, a good compiler is much better at optimization than mostly any human hand-writing code, which could be responsible for at least some of the speedup.