Even in some work such as summing numbers that appears to be not independent between threads, we could still find a way to split the work and achieve substantially speedup by utilizing local variable.
chenh1
We need N >> P to get near-P speedup because when N is small the overhead of threads will offset the benefits get from parallelism. Only when we save a relative large amount of execution time, the speedup will be obvious.
pk267
This overhead includes not just the computation cost, but also the communication cost which could be non-trivial if the nodes are spread out over large geographical distance.
Even in some work such as summing numbers that appears to be not independent between threads, we could still find a way to split the work and achieve substantially speedup by utilizing local variable.
We need N >> P to get near-P speedup because when N is small the overhead of threads will offset the benefits get from parallelism. Only when we save a relative large amount of execution time, the speedup will be obvious.
This overhead includes not just the computation cost, but also the communication cost which could be non-trivial if the nodes are spread out over large geographical distance.