High traffic for the test and set lock comes from many processors invalidating the cache line of the lock. The poor scaling is caused by the increasing bus contention with the increasing number of processors.
To have any provision for fairness, there is a need for some kind of queue or ticketing system.
Would randomization not also provide some level of fairness? If each thread waited for some random amount of time before retrying to attain a lock after failure, I feel like over time this would be fair for the different threads.