This slide shows one possible example of compressing the edge list. The edges are first sorted in numerical order. The first edge (number value) is then represented as an offset relative to the vertex. The rest of the edges are stored as offsets from the first edge. If the edges are close together than the offset is very small, allowing for a lot less bytes to store all of the edges.
I think there's an even better encoding in some cases (uses less space with equal work). Instead of sending the extra [ONE_BYTE, 4], [TWO_BYTE, 1], etc... packets, we can use variable length encoding in addition to the differences. The trick is to use a single bit in each byte to signal whether it is the start of a new number.
i.e. If we treat the first bit as the signal bit 10000001 00000001 10000001 would represent the numbers: 100000001, 1 (in binary).
This isn't better in all cases, but is definitely better in a situation where the deltas vary a lot.
Anyone think that allowing negative numbers here will significantly reduce (half) the amount of numbers that can be represented?
@365sleeping. Correct. But one could say that only the first group allowed negative numbers (thus we'd save the bit for expanded range in all the other groups).
Also, in the header there's 2 bits used to encode three possible modes (1 byte, 2-byte, 4-byte), so I have an extra mode I could use that might be interpreted as "1 byte signed". Or I could use the extra mode to mean "treat the size of the group as 5 bits (instead of 6) and use the last bit in the header to indicate whether the element encoding are signed".
@kayvonf Yes, both make sense. Thanks!
Does this encoding also requires an additional header somewhere for the vertex to store the ptr to the next vertex, or else, it would be difficult to decode this and find other vertices in parallel.
@bojianh I don't think there should be a header. It is too wasteful.
I doubt we need parallelism here, because I guess that this encoding is only suitable for sparse matrix. That means we have many edge lists, but only a few edges in one list. Thus, parallelism over lists is enough.
Also, if we really want parallelism over one list, we can actually have a better encoding.
@365sleeping: what I mean is that we will probably need either a lookup table to find where the vertex edge list is in memory since we would want to process multiple vertex's edge list in parallel.
Note that this approach of compression is very similar to memory compression back when we were discussing memory/cache. We keep a base value for the first word, and every word after it is a $\delta$ + base value.
However, for the vertex set case, since we are able to spend higher CPU time, we are able to group vertex set by segments of vertices using the same number of bits, but in the compressed cache case we could only use one or two pivots.