Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Heterogeneous Parallelism and Hardware Specialization

Previous | Next --- Slide 38 of 52

Back to Lecture Thumbnails

sidzekrom

Do these circuits have the ability to run the whole backpropagation algorithm as a single instruction?

nemo

@sidzekrom I don't think that's feasible! One complete backpropagation iteration for a DNN will be a LOT of (normal) instructions!

kayvonf

Google released a draft copy of their upcoming paper on the TPU today!

"In-Datacenter Performance Analysis of a Tensor Processing Unit", ISCA 2017

https://drive.google.com/file/d/0Bx4hafXDDq2EMzRNcy1vSUxtcEk/view

amg

So not quite the 100x - 1000x performance speculated in the heterogeneous computing lecture, but they were able to see significantly more efficient computations.

I'm curious why they use relatively low-bandwidth memory (DDR3). I would have expected that they would use something like GDDR5

googlebleh

@amg They acknowledge that in the Abstract:

Moreover, using the GPU’s GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.

See Section 7 for their discussion on using GDDR5.

jedi

So many authors working in parallel on a single paper :-O