Previous | Next --- Slide 38 of 52
Back to Lecture Thumbnails

Do these circuits have the ability to run the whole backpropagation algorithm as a single instruction?


@sidzekrom I don't think that's feasible! One complete backpropagation iteration for a DNN will be a LOT of (normal) instructions!


Google released a draft copy of their upcoming paper on the TPU today!

"In-Datacenter Performance Analysis of a Tensor Processing Unit", ISCA 2017


So not quite the 100x - 1000x performance speculated in the heterogeneous computing lecture, but they were able to see significantly more efficient computations.

I'm curious why they use relatively low-bandwidth memory (DDR3). I would have expected that they would use something like GDDR5


@amg They acknowledge that in the Abstract:

Moreover, using the GPU’s GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.

See Section 7 for their discussion on using GDDR5.


So many authors working in parallel on a single paper :-O