Previous | Next --- Slide 10 of 36
Back to Lecture Thumbnails
snshah

Question: When determining if a parallel program scales, what are some of the main things to look at other than the 3 listed above? I imagine that finding a way to split up work effectively and minimize communication overhead is part of the answer, but what are other common considerations that need to be made?

eatnow

Having just taken Distributed Systems last semester, first thing that comes to mind is data. How the amount of input and intermediate data scales (or does not scale), where it is stored, how it is used, etc. Will this be a theme of much importance in the scope of this course?

sbly

How does a parallel programming language work? Since all programming languages are just converted into assembly, does that mean assembly languages for computers with multiple cores have special instructions that allow parallelism?

arjunh

@sbly There are extensions to the x86 instruction set; the first was 3DNow! introduced by AMD. Intel responded with SSE. These instruction sets provide SIMD (Single-Instruction, Multiple Data) based instructions.

The idea behind SIMD is that one instruction can be executed on a sequence of data (eg a mapping function is applied to a sequence of elements; the mapping function doesn't change, only the data).

These instruction sets provide vector registers and instructions to handle both scalar (ie single variables) and packed (ie sequences of values) types. Here's an example (taken from wikipedia), where we want to add two 4-component vectors. The first code snippet is the operation we want to perform; the second is the x86 code generated (in the SSE instruction set):

Python:

vec_res.x = v1.x + v2.x;
vec_res.y = v1.y + v2.y;
vec_res.z = v1.z + v2.z;
vec_res.w = v1.w + v2.w;

movaps xmm0, [v1]      ;xmm0 = v1.w | v1.z | v1.y | v1.x 
addps xmm0, [v2]       ;xmm0 = v1.w+v2.w | v1.z+v2.z | v1.y+v2.y |v1.x+v2.x
movaps [vec_res], xmm0

Most languages (eg C) aren't very 'parallel-code generation' friendly (that is, the compiler typically will not recognize that SIMD can be exploited in the code). To get around this problem, we typically use parallel languages such as CUDA, OpenMP, etc, where users can explicitly state that parallelism exists through the means of language-specific keywords (which just translate to the instructions in the x86 instruction extensions mentioned above).

jmnash

One thing you might have to consider when you need to make scalable code is that most programs are only scalable up to a certain point because only certain sections are parallelizable. So instead of just decomposing the existing code into independent pieces, you might have to rewrite the code to be more parallelizable, even if that makes the code "ugly" and unintuitive.

analysiser

@arjunh So are there any languages that automatically 'parallel-code generation' friendly? And what occasion do these languages be used at? Since it seems to me that for compilers the detection of 'parallel capable' in higher level codes is not that hard.

yrkumar

Furthermore, where do functional programming languages fit in to this discussion of languages that are able to recognize parallel code? For example, does the SML compiler automatically assign expressions to run on different threads based on various dependencies in the program? I would imagine that this is the case since functional languages are referentially transparent and side-effect free, a perfect combination for parallel execution.

rokhinip

There are languages that are inherently parallel like Haskell. While I do not know about the specifics of how Haskell implements this, one can read more about it here: http://www.haskell.org/haskellwiki/Parallel

SMLNJ is not parallel at all although I believe Mlton is. It is difficult to make a blanket statement about how functional languages are referentially transparent and side-effect free since SML is a clear counter example to both of them. However, we do see that Haskell has provided an interface for executing parallel code depending on whether it involves IO or non-IO. So we presumably, the implementation of this parallelism takes into account the lack of side effects in the code block

DunkMaster

Divide and conquer is just naturally a means of writing parallel programs.