How to implement spin lock detail with C++11 atomic weight

2020-06-07 04:53:57
OfStack

1. The spin lock

Spinlocks are a basic synchronization primitive used to ensure exclusive access to Shared data. In contrast to a mutex, failure to acquire a lock does not cause the thread to block but a 1 straight spin attempt to acquire the lock. While the thread waits for the spinlock, CPU cannot do anything else, but 1 is in a polling busy state, etc. Spinlocks are mainly used where the holding time is short and the thread does not want to spend too much time rescheduling. In fact, many other types of locks are implemented using spinlocks at the bottom level, such as most mutex locks that attempt to acquire a lock spin 1 for a short period of time before going to sleep. If a spin lock is used in a scenario where the lock is held for a long time, CPU will spend 1 minute of the thread's time on meaningless busywork and so on before running out of time, resulting in a waste of computing resources.

Note when using spin locks:

Since CPU is not released when spinning, the thread holding the spin lock should release the spin lock as soon as possible, otherwise the thread waiting for the spin lock will waste CPU time by spinning somewhere else.

The thread holding the spin lock should release the spin lock before sleep so that other threads can obtain the spin lock

2. CAS operation to achieve spin lock

CAS (Compare and Swap), a technique often used to compare and replace concurrent algorithms, provides hardware-level atomic manipulation (by locking the bus). The prototype of the CAS operation can be considered as:


bool CAS(V, A, B)

Where V represents the variable in memory, A represents the expected value, and B represents the new value. When the value of V is equal to that of A, the values of V and B are exchanged. Logically it can be represented by the following pseudocode:


bool CAS(V, A, B)
{
 if (V == A)
 {
 swap(V, B);
 return true;
 }
 
 return false;
}

It's important to emphasize that the above operations are atomic, either do nothing or complete.

So how do you implement a spin lock if you already have an CAS operation? First of all, remember that the purpose of a spinlock is essentially to allow a thread to keep busy and wait for polling until it is ready to run. Then, it might be natural to think of using an bool variable to indicate whether you can enter a critical section, such as pseudocode logic like this:


while(flag == true);
flag = true;
/*
do something ...
*/
flag = false;
 ...

The intuitive idea is that when flag is true, it means that there are already threads in the critical zone. Only when flag is fasle can it be entered. On entering, it immediately sets flag to true. However, there is obviously a problem with this. Judging flag as false and setting flag as true are not an indivisible whole. It is possible to have a sequence similar to the following.

step	thread 1	thread 2
1	while(flag == true);
2		while(flag == true);
3	flag = true
4		flag = true
5	do something	do something
6		flag = false
7	flag = false

step is a fictitious step, do something is a series of 1 instructions, written here at 1 to indicate concurrent execution. It can be seen here that reading and judging the value of flag by thread1 and modifying the value of flag are two independent operations, and the judgment operation of thread2 is inserted in the middle. Finally, two threads enter the critical section at the same time, which is contrary to our expectation. So what's the solution? If the operation of reading judgment and modification can be combined 2 to 1 and become an indivisible whole, then naturally such an interlaced scene cannot occur. For such a holistic operation, we want it to read the value of an in-memory variable and, when it is equal to a particular value, modify it to the other value we want. HMM... That's right, so we have the CAS operation.

Now we can modify our synchronization mode again and again. We can continue to operate CAS(flag, flase, b) with the expectation that flag is false (here b is true) until it returns successfully. Then we can operate in the critical section and set flag as false when leaving the critical section.


b = true;
while(!CAS(flag, false, b));
//do something
flag = false;

Now, the judgment operation and write operation have become a whole, when the CAS operation of one thread is successful, it will prevent other threads from entering the critical area and reach the purpose of mutual exclusive access.

We can now use the CAS operation to solve the mutex access problem in critical sections, but it would be too cumbersome to write it once each time, so we can do some encapsulation to make it easier to use, that is... Can be encapsulated as a spin lock. We can represent it as a class, with one bool value as a data member of the class, and CAS operation and assignment operation as its member functions. CAS operation is essentially a lock operation, followed by assignment operation which is an unlock operation.

3. C++ atomic weight

Along these lines, C++ 11 is then used to implement a spin lock with the atomic weight introduced into the standard library and tested.

First, we need a value of bool to indicate the state of the lock, using the atomic weight of atomic in the standard library < bool > (C + + 11 atomic weight can reference: https: / / www ofstack. com article / 141896 htm, in my platform atomic (Cygwin64, GCC7. 3) < bool > The member function is_lock_free () returns true and is a lock-free implementation (spinlock = = if locks are used internally). atomic is actually on most platforms < bool > Both are unlocked. If you are not sure, you can use the C++ standard to specify that atomic_flag must be implemented without locks.

Next, we need two atomic operations, CAS and assignment, which the C++11 standard library provides directly in the member functions of atomic weights.


//CAS
std::atomic::compare_exchange_weak( T& expected, T desired,
     std::memory_order order =
     std::memory_order_seq_cst ),
     
std::atomic::compare_exchange_strong( T& expected, T desired,
     std::memory_order order =
     std::memory_order_seq_cst )
// The assignment 
void store( T desired, std::memory_order order = std::memory_order_seq_cst )

The main difference between compare_exchange_weak and compare_exchange_strong is that if the value in memory is the same as expected, CAS operation 1 must succeed, compare_exchange_weak will return failure probability, while compare_exchange_strong 1 must succeed. Therefore, compare_exchange_weak must be used with loops to ensure that the CAS operation is retried if it fails. The benefit is that compare_exchange_weak performs better on some platforms. Based on the above model, we would have used while with compare_exchange_weak. Finally, the selection of memory order is done without special requirements using the default std::memory_order_seq_cst. The assignment is very straightforward, and the call 1 is guaranteed to succeed (just the assignment = =), with no return value.

The implementation code is very short. Here is the source code:


#include <atomic>

class SpinLock {

public:
 SpinLock() : flag_(false)
 {}

 void lock()
 {
 bool expect = false;
 while (!flag_.compare_exchange_weak(expect, true))
 {
  // Here, 1 Need to will expect Recovery, when execution fails expect The result is indeterminate 
  expect = false;
 }
 }

 void unlock()
 {
 flag_.store(false);
 }

private:
 std::atomic<bool> flag_;
};

As mentioned above, the lock operation keeps trying the CAS operation until it succeeds, and the unlock operation restores the bool flag. The usage is as follows:


SpinLock myLock;
myLock.lock();

//do something

myLock.unlock();

Next, we conduct correctness tests, taking the classic i++ problem as an example:


#include <atomic>
#include <thread>
#include <vector>

// Spinlock class definition 
class SpinLock {

public:
 SpinLock() : flag_(false)
 {}

 void lock()
 {
 bool expect = false;
 while (!flag_.compare_exchange_weak(expect, true))
 {
  expect = false;
 }
 }

 void unlock()
 {
 flag_.store(false);
 }

private:
 std::atomic<bool> flag_;
};

// Number of self-increments per thread 
const int kIncNum = 1000000;
// The number of threads 
const int kWorkerNum = 10;
// Self-increment counter 
int count = 0;
// spinlocks 
SpinLock spinLock;
// Working functions for each thread 
void IncCounter()
{
 for (int i = 0; i < kIncNum; ++i)
 {
 spinLock.lock();
 count++;
 spinLock.unlock();
 }
}

int main()
{
 std::vector<std::thread> workers;
 std::cout << "SpinLock inc MyTest start" << std::endl;
 count = 0;

 std::cout << "start " << kWorkerNum << " workers_" << "every worker inc " << kIncNum << std::endl;
 std::cout << "count_: " << count << std::endl;
 // create 10 Three worker threads perform a self - augmenting operation 
 for (int i = 0; i < kWorkerNum; ++i)
 workers.push_back(std::move(std::thread(IncCounter)));

 for (auto it = workers.begin(); it != workers.end(); it++)
 it->join();

 std::cout << "workers_ end" << std::endl;
 std::cout << "count_: " << count << std::endl;
 // The verification results 
 if (count == kIncNum * kWorkerNum)
 {
 std::cout << "SpinLock inc MyTest passed" << std::endl;
 return true;
 }
 else
 {
 std::cout << "SpinLock inc MyTest failed" << std::endl;
 return false;
 }

 return 0;
}

In the above code, 10 threads were created to perform 1 million ++ operations on the Shared global variable count respectively, and then verify whether the results are correct. The final output of execution is as follows:


SpinLock inc MyTest start
start 10 workers_every worker inc 1000000
count_: 0
workers_ end
count_: 10000000
SpinLock inc MyTest passed

The results show that the spin lock we implemented protects the critical section (in this case, i++), where the final value of count is equal to the sum of the number of self-increments per thread. For comparison, IncCounter lock unlock operation can be removed:


void IncCounter()
{
 for (int i = 0; i < kIncNum; ++i)
 {
 //spinLock.lock();
 count++;
 //spinLock.unlock();
 }
}

The output after execution is:


bool CAS(V, A, B)
{
 if (V == A)
 {
 swap(V, B);
 return true;
 }
 
 return false;
}

The result was an error due to multiple threads executing i++ at the same time.

So far, we have achieved a simple spin lock with the atomic weight of C++ 11. This is just one small use of C++, and there is much to explore both in terms of spin lock itself and atomic weight.

conclusion