Could we also get improved performance by using some kind of transactional approach (or something along those lines, perhaps like load-linked/store-conditional or a compare-and-swap -- or maybe even just a hardware atomic increment) to improve the locking overhead? Seems to me like that would mostly get rid of it, assuming most of the computation time is spent in test_primality, at least.
kayvonf
@Olorin: basically you are suggesting that a better implementation of synchronization might reduce the overhead of making the dynamic assignment of work to threads. And that's certainly true.
Could we also get improved performance by using some kind of transactional approach (or something along those lines, perhaps like load-linked/store-conditional or a compare-and-swap -- or maybe even just a hardware atomic increment) to improve the locking overhead? Seems to me like that would mostly get rid of it, assuming most of the computation time is spent in test_primality, at least.
@Olorin: basically you are suggesting that a better implementation of synchronization might reduce the overhead of making the dynamic assignment of work to threads. And that's certainly true.