If we are thinking about an architecture with least consistency principle would we need memory fences to prevent blind read/writes from happening?
rojo
Having many memory fences would probably be a bad idea as we are stalling the program execution. This would lead to nearly linear execution of code. Perhaps there is a finer grain of control.
gryffolyon
Aren't fences already implemented inside atomic_cir_increment() ? For eg from what I have learn, in intel we have atomic instructions like xadd, xchg, cmpxchg which inherently use fences in them to achieve consistency
lament
Notice that we padded the elements of the pad status so that when we update an element of the array, it won't cause there to be cache-coherence traffic with other processors. In other words, if we did not pad the elements, then we would risk having multiple locks fit in a cache-line, resulting in false-sharing despite different threads of control on different processors accessing different locks.
Kapteyn
I think we have to check that the circular increment doesn't cause any two processors to have the same index, which can be avoided by simply making the id modulo the number of processors.
toothless
what about doing an ordinary atomic_increment(&l->head) and modding by P afterwards? if P is a power of two (say 16) then modding becomes a bitwise AND operation, which is really cheap.
VP7
Since the given piece of code is just a C-like pseudo code i would assume that "Atomic increment" implicitly mean that the atomic modify will implement fences to ensure consistency.
If we are thinking about an architecture with least consistency principle would we need memory fences to prevent blind read/writes from happening?
Having many memory fences would probably be a bad idea as we are stalling the program execution. This would lead to nearly linear execution of code. Perhaps there is a finer grain of control.
Aren't fences already implemented inside atomic_cir_increment() ? For eg from what I have learn, in intel we have atomic instructions like xadd, xchg, cmpxchg which inherently use fences in them to achieve consistency
Notice that we padded the elements of the pad status so that when we update an element of the array, it won't cause there to be cache-coherence traffic with other processors. In other words, if we did not pad the elements, then we would risk having multiple locks fit in a cache-line, resulting in false-sharing despite different threads of control on different processors accessing different locks.
I think we have to check that the circular increment doesn't cause any two processors to have the same index, which can be avoided by simply making the id modulo the number of processors.
what about doing an ordinary atomic_increment(&l->head) and modding by P afterwards? if P is a power of two (say 16) then modding becomes a bitwise AND operation, which is really cheap.
Since the given piece of code is just a C-like pseudo code i would assume that "Atomic increment" implicitly mean that the atomic modify will implement fences to ensure consistency.