Question: do we need to sync the parameters between multiple parameter servers? If no, do we still have convergence guarantee? If yes, do we need to sync every iteration, or can we sync once for maybe, say 5 or 10, iterations?
apadwekar
I dont believe we have to sync the parameters between multiple servers as each server is in charge of a separate chunk. Just like when we have a single parameter server, convergence is not guaranteed as the local sub gradients dont always reflect the global sub gradients. Sharding the parameters simply solves the contingency and memory issues.
Levy
@cwchang
We don't need to. Different parameter servers keep different sharding of parameters and there's no need to keep consistency among them.
Question: do we need to sync the parameters between multiple parameter servers? If no, do we still have convergence guarantee? If yes, do we need to sync every iteration, or can we sync once for maybe, say 5 or 10, iterations?
I dont believe we have to sync the parameters between multiple servers as each server is in charge of a separate chunk. Just like when we have a single parameter server, convergence is not guaranteed as the local sub gradients dont always reflect the global sub gradients. Sharding the parameters simply solves the contingency and memory issues.
@cwchang We don't need to. Different parameter servers keep different sharding of parameters and there's no need to keep consistency among them.