Domain-Specific Programming Systems : 15-418 Spring 2013

As we have seen throughout the course, designing parallel systems that are efficient and systems that scale is a fairly challenging problem. Keep in mind that all the assignments in this class have been on homogenous systems where all the hardware execution units are more or less the same so when considering hardware with several different execution units, one can see how quickly the level of difficulty in designing parallel systems rises. Hence the topic of today's lecture: domain specific programming systems.

Today, there are three key features that play a role in determining what language to use for a certain programming task:

High Performance - How scalable and efficient can the code in the language be?
Productivity - How fast/easy is it to develop in the language?
Completeness - Is the language expressive enough to write any application the programmer wants?

A programming language that has all three features does not exist however, most widely used programming languages trade off one of the features in order to excel at the other 2 (C/C++ loses out in productivity but has high performance and completeness, languages such as python and javascript have high productivity and completeness but do not perform as well). The last combination, high performance, high productivity, and incompleteness, belongs to domain-specific languages (DSLs).

The idea behind domain-specific languages and programming frameworks revolves around one of the central ideas in this class: abstraction. By raising the level of abstraction, programmers can focus less on the hardware specifications of a machine during development time. The domain specific system knows what to do for different platforms and thus can run specialized code, efficient, high performing code geared for the specific environment (imagine a system that knew to run multithreaded code on one machine but SIMD code on another for data parallel processing). The result of this, however, is that in order to provide this level of abstraction, a tradeoff was made in completeness. As its name implies, domain specific languages specialize in solving certain types of problems really well. Because it knows the nature of the problems it attempts to solve, in can make assumptions, optimizations, and restrictions in order to achieve high levels of productivity and performance.

Example DSLs

Liszt is a programming language where the only objects are meshes, a graph topology where information (information is a termed used loosely as it is an abstraction) flows between the nodes. Liszt restricts operations to act only on the topology of the mesh data type (vertices and edges) and the fields of the mesh (reading the "value" of a vertex). The language further restricts variable assignments to be static and disallows recursion. The aforementioned restrictions allow the compiler to make assumptions and optimizations according to the system the code is to be run on. The access restrictions allow for the compiler to specifically know when and where all accesses to data are and thus infer the dependencies. The recursion restriction allows the analysis to terminate (one can imagine a scenario where it cannot be determined if a recursive function will terminate). Because of these optimizations, environment specific code can be genereated, for example, in a distributed environment, the mesh can be partitioned into largely independent sections except for the boundaries and the underlying code produced can use MPI. On a GPU, the compiler can produce a graph representing the mesh and use a "coloring" algorithm on the nodes to determine the dependencies in the graph (nodes with the same color are independent).

Halide is an image processing language that is actually a combination of two languages:

The Algorithm - Functional image processing algorithms (no side effects, purely mathematical)
The Schedule - Maps the above algorithms to the machine (how the high level algorithm specified actually runs on the target machine)

There are four primitive scheduling mechanism: inline, chunk, root and reuse. Definition and restriction for the order of execution in certain stages is also capable. As with Liszt, by making these high level abstractions, the compiler is able to generate high performance code, provided that the depency and pipeline could be inferred by the compiler.

DSL Development

There are two ways to implement a DSL system

Develop a DSL as standalone language (e.g., Matlab, SQL)
"Embed" DSL in generic programming language (e.g., C++ Library, Lizst)

Facilitation: In order to facilitate "embed" DSL in generic embedding language, the compiler can be modified with a "Mudular staging" approach. It adopts some front-end early stages of the compiling process (lexing, parsing, type-checking) and customize the intermediate representation or participation to the analysis/optimization and code-gen phase.

Pros and Cons of DSL

Utilize heterogeneous parallel machines to increase computational capability
Exploit peak capability of machines
Trade off generality(completeness) to productivity, performance and portability

Questions to ponder

What are the possible challenges in terms of workload balance and memory/cache consistency is heterogeneous systems, for example, a CPU-GPU system?
What are the advantages/disadvantages of "embed" DSL against standalone DSL?
Is DSL going to be the future trend of programming langauge development? Is it health or unhealthy?