Previous | Next --- Slide 41 of 42
Back to Lecture Thumbnails

In the single-core case, graph compression did not improve performance because the program was not bandwidth bound. It actually decreased performance because the program had the increased overhead of decoding the compressed graph.


What is nibble and why is its runtime so much slower compared to the original?


@Lawliet I asked that too (and probably should have edited instead of deleting my comment), since "nibble" sounds like a trendy framework or something. But it seems like here it's just the term for 4 bits, considering that "byte" is another bar type. The asterisk in the bottom left clarifies that they're all compression schemes, so I'd guess that means "nibble" is more compressed relative to the others (since it's smaller), and the increased time is due to the overhead it takes to pack and unpack into such a small unit. If someone else knows more about it, though, by all means correct me on that.


I was wondering why this graph compression scheme would increase performance even for graphs that already fit in memory. Does anyone have any insight on this?


@chuangxuean: because it requires memory bandwidth requirements! Recall that many operations on graphs are bandwidth-bound, and compressing the graph means fewer bits needs to be transferred to the processor from memory.

The key point: In a bandwidth-bound scenario anything you can do to reduce the number of bits transferred from memory will be useful to improve overall program performance, because the processor is stalling waiting on bits from memory.


I am assuming, in case where graph fits in memory, the bandwidth-bound means moving data from disk to memory? and after compression less data would be moved and thus improve performance


@yangwu: No, it very much means memory bandwidth bound in that case. Consider all the situation in this class where a processor was waiting on data from DRAM. Just fitting your dataset in memory is not good enough these days! Recall Assignment 1, problem 5, as well as this slide.


Increasing the arithmetic intensity of the program (if possible) would also help alleviate a bandwidth bound application, because it would increase the amount of time between subsequent sets of memory accesses. While this doesn't reduce the total amount of data movement in the program, it does increase the amount of computation done on each piece of moved data, given a lower communication overhead.