The #1 supercomputer on this list doesn't use a GPU, it uses a bunch of another kind of many core processor, the PEZY-SC
Summary of what it is: 1024 MIMD nodes with 2 ARM head nodes.
It's interesting to see how frequently to achieve low-power high performance the solution is more low-voltage lower-frequency cores. This is because P = CV^2f. What that means is, if the voltage can be halved and the frequency halved (frequency often has to decrease in lower voltage processors), the power decreases 8-fold, and the power per cycle decreases 4-fold.
This fits into the broader theme that application specific hardware runs faster and more energy efficient than general purpose hardware (e.g. ASICs run with lower power than FPGAs, and even FPGA's run with lower power than CPUs).