Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

In-Memory Distributed Computing using Spark

Previous | Next --- Slide 3 of 43

hweetvpu

Partial results in program 2 can be stored in register for subsequent quick accesses. In contrast, the arrays in program 1 can be very large, which makes it hard to do this (and thus lower arithmetic intensity).

ask

This is similar to question 1E of the midterm exam. By fusing the multiple operations into one function, the arithmetic intensity increases as more operations are packed for the same number of memory operations.

MichaelJordan

In program 2, we fused the two addition and one multiplication operations together (so 3 math ops). We also only need to load A,B,D,C and store E, for a total of 5 memory operations. In total, then, the arithmetic intensity is 3/5. On the other hand, back in program 1, because we did all of an add first, all of a multiply, then all of another add, we needed to do 3 memory operations per one math operation for intensity of 1/3. (We cannot store all of A, B, D, C in registers and do all 3 math operations at once in a single loop through.)

bazinga

The reason why the optimization in program #2 results in a speedup is that program #1 is inherently memory bandwidth bound (it has relatively few math ops which must wait for the significantly slower memory system due to the limited memory bandwidth). Program #2 has the exact same number of math ops, but we reduced the amount of work that the memory system had to do which results in the speedup!

anonymous

The key difference between #1 and #2 is that #1 communicates through memory while #2 through registers.