Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Parallel Deep Network Training

Previous | Next --- Slide 42 of 44

Back to Lecture Thumbnails

Calloc

I like this idea, abides by "f**k it, ship it".

bojianh

So, I believe this may result in something similar (but not exactly the same) to Neural Network Dropout. If there are race conditions, not all the gradient updates will be applied precisely. I think it may encourage more distributed learning between the neural network units which generalize better.

365sleeping

@bojianh Dropout is intentionally dropping weights, but in this case it is unintentional. It still works just because the gradient descend is robust. It sometimes even outputs better results because we luckily get out of bad local optimum.

bojianh

@365sleeping: I am wondering if the final effects of both neural network dropout and no synchronization is similar.

365sleeping

@bojianh After a second thought, I think you are right. For dropout, we only need to know the dropout rate, and then make up for that. Therefore, If we can estimate the average dropout rate $p$ induced by asynchronization, which is not hard, we can treat it as a layer with dropout rate of $p$.

cyl

Both dropout and and asynchronous update are applying randomness to the weight during the training.