If we compare parallel program speedup to this algorithm running one core, we are easy to have a speedup because we have P times of resources. (P is the number of cores). However, our goal to use parallelism is to enhance the performance of finishing a task (e.g reducing time). If we only use one core, we should use the fastest algorithm running on a single core. The actual speedup of using P cores should be "time of fastest algorithm running on a single core" / "time of parallel algorithm running on P cores".
I agree with @Fantasy because algorithms designed for parallelism usually does more work than a single core algorithm like communication and synchronization. We thus should use the fastest algorithm on single core as the bench mark.