Why is the time not consistently growing, but fluctuates? For example, 7 processors are performing better than 5 processors.
arcticx
@KnightsLanding
I don't have an intelligent explanation to it. - Kayvon
I think probably this is just due to noise. The general trend is more important.
Fantasy
If I remembered correctly, professor said it might be due to measurement.
KnightsLanding
I'm curious about the scalability of locks.
The below experiment used pthread, and pthread_mutex for lock, simple add to the counter as critical section, pinned threads to cores, and the result look like this.
#cores: 1, 2.33s user 0.00s system 99% cpu 2.334 total
#cores: 2, 10.08s user 7.05s system 196% cpu 8.708 total
#cores: 3, 16.32s user 14.40s system 300% cpu 10.218 total
#cores: 4, 23.69s user 29.26s system 477% cpu 11.087 total
#cores: 5, 32.30s user 39.78s system 526% cpu 13.687 total
#cores: 6, 31.80s user 57.02s system 645% cpu 13.755 total
#cores: 7, 29.36s user 78.03s system 735% cpu 14.599 total
#cores: 8, 41.50s user 93.31s system 897% cpu 15.026 total
#cores: 9, 18.03s user 102.42s system 861% cpu 13.981 total
#cores: 10, 13.34s user 115.76s system 945% cpu 13.655 total
#cores: 11, 11.63s user 119.27s system 998% cpu 13.116 total
#cores: 12, 11.59s user 139.38s system 1130% cpu 13.355 total
#cores: 13, 12.07s user 161.91s system 1241% cpu 14.011 total
#cores: 14, 9.44s user 169.86s system 1327% cpu 13.510 total
#cores: 15, 9.66s user 186.17s system 1429% cpu 13.695 total
#cores: 16, 8.81s user 185.34s system 1495% cpu 12.978 total
The user time first increases then decreases, and the system time increases consistently, the total time grows then fluctuates but flattens.
(each thread do 100000000/cores number of iterations)
bpr
@KnightsLanding, a recent pthread_mutex implementation supports multiple spinning features. The mutex may maintain a moving average of wait times and then either backoff to a function of that average or decide that the wait time exceeds the context switch time, in which case it will wait on a kernel futex. If it waits on the futex, it is not spinning in user mode and instead adds some additional load to the kernel (i.e., system).
IF you are really curious, there is a project to be had here.
KnightsLanding
@bpr Thank you! This sounds interesting.
Lawliet
Would the wait time keep increasing as we add more processors or would there be some theoretical limit to how long we wait?
notanonymous
I think the wait time would keep increasing as the more processors that are waiting for the lock, the longer it will take for all the processors to gain access to the lock. This is evident in the diagram in the slide and there is no reason for a theoretical limit to be observed.
Why is the time not consistently growing, but fluctuates? For example, 7 processors are performing better than 5 processors.
@KnightsLanding
I think probably this is just due to noise. The general trend is more important.
If I remembered correctly, professor said it might be due to measurement.
I'm curious about the scalability of locks.
The below experiment used pthread, and pthread_mutex for lock, simple add to the counter as critical section, pinned threads to cores, and the result look like this.
The user time first increases then decreases, and the system time increases consistently, the total time grows then fluctuates but flattens.
(each thread do 100000000/cores number of iterations)
@KnightsLanding, a recent pthread_mutex implementation supports multiple spinning features. The mutex may maintain a moving average of wait times and then either backoff to a function of that average or decide that the wait time exceeds the context switch time, in which case it will wait on a kernel futex. If it waits on the futex, it is not spinning in user mode and instead adds some additional load to the kernel (i.e., system).
IF you are really curious, there is a project to be had here.
@bpr Thank you! This sounds interesting.
Would the wait time keep increasing as we add more processors or would there be some theoretical limit to how long we wait?
I think the wait time would keep increasing as the more processors that are waiting for the lock, the longer it will take for all the processors to gain access to the lock. This is evident in the diagram in the slide and there is no reason for a theoretical limit to be observed.