Based on these slides, continuation stealing seems to be the preferable scheduling choice because, as slide 40 notes, child stealing may require up to O(N) space since the caller thread spawns all the work up front. This could be rather inefficient if N is large. I'd be interested to hear if there are any other significant pros and cons to the two approaches in practice.
I also wonder if there would be any benefit to doing a little bit of both -- say, randomly picking 50/50 whether to run the child first or to run the continuation first. Intuition says that this would be messy and not a great idea, however.
But with continuation stealing, you have the overhead of performing a steal, and this could be linear in N, if thread 2 steals from thread 1, then thread 3 steals from thread 2, and so on.
@lol I think the concern is that child stealing could require O(n) space in a single work queue. If the total space used is O(n) over n work queues then I don't it's a problem.