How is a barrier implemented, either at the software or hardware level? And how does the complexity change as the number of threads increase?
You would probably implement it using some other synchronisation primitive like a condition variable
There's one for processes here:
Basically: semaphores. The nth thread (the last one to finish) then signals one other thread, which then signals one other thread, etc, until there are no more threads to be signaled.
@lol There are many ways to implement a barrier. It usually depends on the primitives provided by the environment and runtime. For example, an OpenMP barrier can internally use the OpenMP runtime which may be completely different in implementation from a pthread barrier. There is also a separate body of research that has explored barrier support in hardware.
Barrier complexity can scale at O(log n) although most implementations use simpler O(n) designs given the low number of threads usually encountered.