Previous | Next --- Slide 55 of 66
Back to Lecture Thumbnails

Much earlier in the semester when we were learning abstractions for parallel programming (like ISPC, OMP, Cuda, etc...), a major benefit was that we could "ignore" the low-level implementation details and let the compiler decide how to execute the code efficiently for the machine.

If we draw "parallels" from those programming abstractions to domain-specific languages, we realize that they are similar in that they hide some of the nitty-gritty details that makes the code hard to write. However unlike the programming abstractions (in my opinion), Halide also requires the programmer to understand what's going on behind-the-scenes. To summarize: if someone can write a Halide program, they can also write the raw code that Halide generates. However, being able to write an ISPC/OMP program doesn't necessarily mean the programmer knows how to write the raw code (using SIMD/threads respectively).


Yes, I agree with @trappedin418. And when we compare Halide with Liszt, Halide is expecting more from the programmer's responsibility as the scheduling policy needs to be specified while Liszt only expects mesh structure and operations defined. At the same time, Halide requires a higher level of expertise from programmers to be able to produce good scheduling policy.


@trapedin418 @cc I think a good way to summarize what you have both said is, in general Liszt does not require the user (programmer) to have hardware specific knowledge. It makes decisions on its own about how to make the application more efficient, while being mostly transparent to the user. While Halide, it seems to me, is more a tool of convenience. That is, it speeds up implementation by generating device specific code using the hardware specific knowledge provided by the programmer. It helps in the situation where the the programmer knows the techniques but would spend a long time on implementation details. I don't think it necessarily means that the programmer must know how to write the raw code, only that the programmer understands the constructs which are best for particular hardware.


It seems from this lecture that we are mostly utilizing the scheduling part of Halide and tweaking it to extract maximal performance in our programs. Are there ways in which we can be using the operation description part of the language to improve performance?


is there any reason why Halide only works on feed-forward pipeline? I can understand that reduction / recursion would complicate the dependency analyze for compiler, so does it mean Halide cannot be used for tasks like backpropagation?