Is it possible that the overhead of partitioning data to create narrow dependencies makes it not worth the performance gain from having narrow dependencies?
nemo
Partitioning the data does incur cost at the beginning but if it makes dependencies narrow for most of your computation it will be worth the overhead particularly in case of large clusters where node failures and heterogeneity will hit performance if dependencies are wide!
hzxa21
But partitioning incorrectly may introduce workload imbalance.
Is it possible that the overhead of partitioning data to create narrow dependencies makes it not worth the performance gain from having narrow dependencies?
Partitioning the data does incur cost at the beginning but if it makes dependencies narrow for most of your computation it will be worth the overhead particularly in case of large clusters where node failures and heterogeneity will hit performance if dependencies are wide!
But partitioning incorrectly may introduce workload imbalance.