So not quite the 100x - 1000x performance speculated in the heterogeneous computing lecture, but they were able to see significantly more efficient computations.
I'm curious why they use relatively low-bandwidth memory (DDR3). I would have expected that they would use something like GDDR5
googlebleh
@amg They acknowledge that in the Abstract:
Moreover, using the GPU’s GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.
See Section 7 for their discussion on using GDDR5.
jedi
So many authors working in parallel on a single paper :-O
Do these circuits have the ability to run the whole backpropagation algorithm as a single instruction?
@sidzekrom I don't think that's feasible! One complete backpropagation iteration for a DNN will be a LOT of (normal) instructions!
Google released a draft copy of their upcoming paper on the TPU today!
"In-Datacenter Performance Analysis of a Tensor Processing Unit", ISCA 2017
https://drive.google.com/file/d/0Bx4hafXDDq2EMzRNcy1vSUxtcEk/view
So not quite the 100x - 1000x performance speculated in the heterogeneous computing lecture, but they were able to see significantly more efficient computations.
I'm curious why they use relatively low-bandwidth memory (DDR3). I would have expected that they would use something like GDDR5
@amg They acknowledge that in the Abstract:
See Section 7 for their discussion on using GDDR5.
So many authors working in parallel on a single paper :-O