A tradeoff can be made in terms of update frequency. Reducing the frequency can reduce communication overheads.
I understand that it reduces communication overhead, however what's the flip side to reducing the frequency?
@captainFlint The flip side is to use the stale parameters, and thus the gradient may not be perfect.
Could worker nodes communicate data with each other?
A tradeoff can be made in terms of update frequency. Reducing the frequency can reduce communication overheads.
I understand that it reduces communication overhead, however what's the flip side to reducing the frequency?
@captainFlint The flip side is to use the stale parameters, and thus the gradient may not be perfect.
Could worker nodes communicate data with each other?