One possible explanation for the super-linear speed-up on the right might be that the entire working set fit in the cache.
@ask, you're likely right. The Liszt paper says: The FEM application experiences super-linear scaling from 256 to 512 cores, as the working set of the algorithm becomes small enough to fit entirely into L3 cache. ... FEM is a different simulation that had a more pronounced superlinear speedup.
Other observations from the same paper that relate to our course:
The gray line indicates the maximum linear speedup that can be achieved. And hence in the second graph, Liszt program is showing a super-linear speedup due to temporal and spatial locality of data, as mentioned by ask.