Previous | Next --- Slide 44 of 60
Back to Lecture Thumbnails
kayvonf

Question: What are other good examples of data-parallel programming systems (or array-based programming systems)?

coffee7

R supports array-based programming. It is possible to use R with Intel's Math Kernel Library, which "features highly optimized, threaded and vectorized functions to maximize performance."

pdp

Hadoop streaming (in python), Hive processing, are also data parallel frameworks?

Master

Numpy is a great example of usage of SIMD in scientific computing. See this to see how to use Numpy to make best use of multi cores:

Why are NumPy arrays efficient?

A NumPy array is basically described by metadata (number of dimensions, shape, data type, and so on) and the actual data. The data is stored in a homogeneous and contiguous block of memory, at a particular address in system memory (Random Access Memory, or RAM). This block of memory is called the data buffer. This is the main difference with a pure Python structure, like a list, where the items are scattered across the system memory. This aspect is the critical feature that makes NumPy arrays so efficient.

Why is this so important? Here are the main reasons:

Array computations can be written very efficiently in a low-level language like C (and a large part of NumPy is actually written in C). Knowing the address of the memory block and the data type, it is just simple arithmetic to loop over all items, for example. There would be a significant overhead to do that in Python with a list.

Spatial locality in memory access patterns results in significant performance gains, notably thanks to the CPU cache. Indeed, the cache loads bytes in chunks from RAM to the CPU registers. Adjacent items are then loaded very efficiently (sequential locality, or locality of reference).

Data elements are stored contiguously in memory, so that NumPy can take advantage of vectorized instructions on modern CPUs, like Intel's SSE and AVX, AMD's XOP, and so on. For example, multiple consecutive floating point numbers can be loaded in 128, 256 or 512 bits registers for vectorized arithmetical computations implemented as CPU instructions.

Additionally, let's mention the fact that NumPy can be linked to highly optimized linear algebra libraries like BLAS and LAPACK, for example through the Intel Math Kernel Library (MKL). A few specific matrix computations may also be multithreaded, taking advantage of the power of modern multicore processors.

In conclusion, storing data in a contiguous block of memory ensures that the architecture of modern CPUs is used optimally, in terms of memory access patterns, CPU cache, and vectorized instructions.

harlenVII

GraphLab would be a proper example of data-parallel programming system.

dmerigou

An interesting example of data-parallel programming is the Chaos system developed at EPFL. A presentation of the system which deals with large graphs computations can be found here.

The programming model is based on scatter-gather operations.

Basically their major concern is to achieve a very good balance between processing cores but also with I/O between the cores and the storage.

ShadoWalkeR

Using vector operations in Matlab instead of loops can give tremendous speed gains. A program with loops which takes several minutes, if written in Vector operations, runs in just a few seconds.

Firephinx

Google's Tensor Flow and Theano are two popular Machine Learning libraries that heavily utilize multi-dimensional array operations for efficient neural network data processing.

Both libraries can take advantage of CUDA to dramatically increase the processing speed of deep learning tasks such as training a new neural network on a data set.

rav

Data-parallel models are a key feature in Julia (a relatively new technical computing language ) that gives extremely fast performance. https://software.intel.com/en-us/articles/vectorization-in-julia .

The compiler does not always automatically vectorize operations, it is interesting to think from a compiler point of view of how to automatically recognize where/when to vectorize

cluo1

this map(function, collection) logic is reused in multiple functional programming language.