Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

KnightsLanding

Why is the time not consistently growing, but fluctuates? For example, 7 processors are performing better than 5 processors.

arcticx

@KnightsLanding

I don't have an intelligent explanation to it. - Kayvon

I think probably this is just due to noise. The general trend is more important.

Fantasy

If I remembered correctly, professor said it might be due to measurement.

KnightsLanding

I'm curious about the scalability of locks.

The below experiment used pthread, and pthread_mutex for lock, simple add to the counter as critical section, pinned threads to cores, and the result look like this.

#cores: 1,  2.33s  user 0.00s   system 99%   cpu 2.334 total
#cores: 2,  10.08s user 7.05s   system 196%  cpu 8.708 total
#cores: 3,  16.32s user 14.40s  system 300%  cpu 10.218 total
#cores: 4,  23.69s user 29.26s  system 477%  cpu 11.087 total
#cores: 5,  32.30s user 39.78s  system 526%  cpu 13.687 total
#cores: 6,  31.80s user 57.02s  system 645%  cpu 13.755 total
#cores: 7,  29.36s user 78.03s  system 735%  cpu 14.599 total
#cores: 8,  41.50s user 93.31s  system 897%  cpu 15.026 total
#cores: 9,  18.03s user 102.42s system 861%  cpu 13.981 total
#cores: 10, 13.34s user 115.76s system 945%  cpu 13.655 total
#cores: 11, 11.63s user 119.27s system 998%  cpu 13.116 total
#cores: 12, 11.59s user 139.38s system 1130% cpu 13.355 total
#cores: 13, 12.07s user 161.91s system 1241% cpu 14.011 total
#cores: 14, 9.44s  user 169.86s system 1327% cpu 13.510 total
#cores: 15, 9.66s  user 186.17s system 1429% cpu 13.695 total
#cores: 16, 8.81s  user 185.34s system 1495% cpu 12.978 total

The user time first increases then decreases, and the system time increases consistently, the total time grows then fluctuates but flattens.

(each thread do 100000000/cores number of iterations)

bpr

@KnightsLanding, a recent pthread_mutex implementation supports multiple spinning features. The mutex may maintain a moving average of wait times and then either backoff to a function of that average or decide that the wait time exceeds the context switch time, in which case it will wait on a kernel futex. If it waits on the futex, it is not spinning in user mode and instead adds some additional load to the kernel (i.e., system).

IF you are really curious, there is a project to be had here.

KnightsLanding

@bpr Thank you! This sounds interesting.

Lawliet

Would the wait time keep increasing as we add more processors or would there be some theoretical limit to how long we wait?

notanonymous

I think the wait time would keep increasing as the more processors that are waiting for the lock, the longer it will take for all the processors to gain access to the lock. This is evident in the diagram in the slide and there is no reason for a theoretical limit to be observed.