Your 15-418/15-618 final project gives you the opportunity to dive deeply into a parallel systems problem of your choosing for the final month of the course. Perhaps more importantly to some of you, it is your big chance to achieve fame, glory, and prizes at the parallelism competition. What you attempt for your project is completely up to you. There are only two requirements: (1) We want your project to be challenging (you should learn something relevant to the themes of this class) and (2) we want your project to be fun (you should be pumped to work on it)!
Choosing a Project
One common way to choose a project is to design a parallel solution to a problem in an application area that is interesting to you. Projects you have attempted in other classes are a good source of ideas. For example, projects in machine learning, AI, graphics, computational photography, and computer vision often stand to benefit greatly from parallelization. If you can convince the course staff that a parallel programming problem in one of these application domains is sufficiently challenging (that is, the solution to get good speedup is not obvious to you from the start), it's likely it will make a good project.
Other project ideas focus on system design or workload evaluation. For example, a project might compare the performance of CPU and GPU implementations of a parallel algorithm, and describe which platform is better suited for the job. Alternatively you could choose to evaluate different versions of an algorithm for different architectures. You could simulate the behavior of code on machines with different SIMD widths, add a feature to the ISPC compiler (its implementation is open source), or develop a parallel debugging tool that helps visualize bottlenecks and performance in parallel programs.
You may implement your project on any parallel platform. Gates machines, the latedays cluster, GPUs, Amazon EC2, iPhone/iPad/Android SoCs (we have NVIDIA Tegra K1 dev kit if you need one), FPGAs, Raspberry Pi, and architecture simulators are all possible and welcome platforms for projects. We also can give you access to a 32-core (64 hyper-thread) machine in the final week of the project for performance measurement if latedays doesn't meet your needs. Come talk to us if you have exotic computing needs --- sometimes the staff knows about a resource around campus that you might not be aware of.
- Application-oriented projects (parallelize an application):
- Implement a game playing system: Chess, Go, etc.
- Graphics: implement a parallel ray tracer; extend assignment 2 to achieve high performance under real workloads (render real triangle meshes and more complex shading functions) (those interested in graphics projects should see Kayvon)
- Physical simulation: implement a high-resolution fluid simulation, rigid body solver, cloth simulation
- Computer vision: many possibilities here: real-time object detection/tracking, image similarity search in a large image database
- Binary decision diagrams: Implement a binary decision diagram package. These are data structures that require constructing and managing very large (millions of node) graphs, with applications to verification, AI planning, and combinatorial optimization. There have been several attempts at parallelizing them, but the most commonly used BDD library is purely sequential. Getting these to run well on Xeon Phi would be of special interest to Intel, since they use BDDs extensively in designing and verifying their chips. The main effort would be to create a high quality lock-free hash table implementation.
- Solvers: Implement a parallel linear solver (using the conjugate-gradient or multi-grid method)
- Take a look at Guy Blelloch's problem-based parallel algorithm benchmark suite. Any solution that improves on the algorithms in the performance suite would be a great project.
- Implement an application on an FPGA (see CoRAM)
- Compare the performance of different parallel algorithms for the same task on different machines (often different algorithms are best for different platforms)
- Implement a lock-free data structure.
- Performance of Data compression (throughput and compression rate)
- Note to students focused on applications for distributed systems. You may want to consider implenting your work in Legion or Spark.
- Machine Learning Applications on big data. Parallelize a machine learning algorithm (e.g., parallel logistic regression, clustering, large scale classification, deep learning). Also see major software efforts like TensorFlow the Petuum project, MLBase, and the parameter server, and Caffe con troll, low-recision SGD (Buckwild),
- Implement parts of the Caffe deep learning framework to Xeon Phi or Intel Gen via OpenCL.
- Implement and schedule a set of modern deep convolutional networks (see list below) in Halide (see Kayvon if interested!)
- Google's Inception net (here or here)
- Fully Convolutional Networks (here)
- Spatial transformer networks (here)
- Deep Residual Learning for Image Recognition (here or here)
- 1-bit DNNS (here)
- Deep compression (here)
- Composing networks (here)
- Annotate the compiled ISPC code with calls to CPU performance monitor instructions and then gather interesting statistics about program execution: cache hits/misses, IPC (and scalar IPC and vector IPC separately). Create a visualization tool for the results. (from Matt Pharr)
- Do the previous for Contech
- For the really brave: Add polymorphic functions to ISPC: implement a function template mechanism in the compiler since it gets to be painful to write multiple versions of functions with both uniform and varying parameter types.
- Add a GPU backend for a subset of GraphLab, or simply improve the existing parallel runtime for clusters or CPUs.
- Extend your 15-411 compiler to generate parallel code.
- Add a new feature to the Halide image processing language.
- Design a mini domain-specific language (or API, or framework) for a problem domain you are interested in. If your interested in building a compiler for a domain specific language definitely take a look at Terra (Kayvon can give you more Terra references)
- Develop a task scheduler for Contech
- Systems and analysis projects:
- Study a workload's amenability to SIMD execution. Simulate behavior given multiple SIMD widths.
- Modify or analyze a workload using GPGPU Sim, PIN, Contech or talk to Li or Brian about other simulation tools.
- Modify your 15-410 kernel to utilize a parallel machine
- Investigate parallel implementations of
- Investigate lock implementations across thread count, hold time, architecture
- Measure the energy consumption of a parallel computer under various loads: (one example is here)
- Build a real elastic web server using Amazon's actual services.
- Measure/analyze the energy consumption on a mobile device while certain applications are running.
- Implement High performance parallel key-value store.
- Build a heterogeneous computing system to accelerate the performance of applications using FPGAs.
- Develop an API for Android applications to take advantage of SIMD on ARM processors. See http://developer.android.com/ndk/guides/cpu-arm-neon.html.