Note that the second case is less efficient than the other two, because the main thread isn't doing useful work while the other threads run their assigned functions. I think it makes sense for the earliest spawned threads to do the most expensive function calls while the later spawned threads do less expensive tasks until the main call which does the least work. This allows all of the threads to finish as close to each other as possible if they are assigned to different processors. Does anyone know if this makes a difference in practice?
I tried looking around, but I don't think the spawns actually take up that many clock cycles (unless you're doing like a million spawns) to warrant having to schedule them in terms of most to least work. Also, cilk could very well run them in serial, which then scheduling in terms of amount of work really doesn't matter.