Does OpenMP support loop fusion internally?
How exactly does OpenMP run? Every time there is a 'pragma omp parallel ...', does it create a new team of thread to run the relevant operation? Or does it have a ton of barriers that control work flow? If it's the first case (which I think it is), is it capable of fusing two for loops so that only a barrier is necessary rather than letting all the threads terminate and then starting them all up again (which certainly has a performance cost)?