Full analysis of shared_ptr thread safety

  • 2020-04-02 01:30:17
  • OfStack

As "STL source analysis" said, "before the source code, no secret." Based on the source code of shared_ptr, this article extracted the class diagram and object diagram of shared_ptr, and then analyzed how shared_ptr guarantees the thread safety claimed by the document. The analysis in this article is based on boost 1.52, which is compiled by VC 2010.

Thread safety for shared_ptr
The official expression of shared_ptr thread safety in the boost documentation is that the shared_ptr object provides the same level of thread safety as the built-in type. Shared_ptrobjects offer the same level of thread safety as built in types.

1. The same shared_ptr object can be read by multiple threads simultaneously. 【 A shared_ptrinstance can be "read" (using only const operations)simultaneously by multiple threads. 】

2. Different shared_ptr objects can be modified by multiple threads at the same time (even though these shared_ptr objects manage Pointers to the same object). The company from the instances can be written "to" (accessed using mutable operations to the as operator = or reset) simultaneouslyby multiple threads (even when these instances are copies. And share the samereference count underneath.)

3. The result of any other concurrent access is undefined. 【 Any other simultaneous accesses the result in undefined behaviors."

The first is a concurrent read to an object, which is naturally thread-safe .

In the second case If two shared_ptr objects A and B manage Pointers to different objects, the two objects are completely unrelated and support for concurrent writing is easy to understand. But if A and B manage Pointers to the same object P, A and B need to maintain A Shared memory area that records the current reference count for the P pointer. Concurrent writes to A and B necessarily involve concurrent modifications to this reference-count memory area, which requires additional work from boost and is the focus of this article's analysis.

In addition, weak_ptr is closely related to shared_ptr. Users can construct shared_ptr from weak_ptr, and can also construct weak_ptr from shared_ptr, but weak_ptr does not involve the life cycle of the object. Because the thread safety of shared_ptr is coupled to weak_ptr, the analysis in this paper also involves weak_ptr.

Let's take a look at shared_ptr and weak_ptr implementations in general.

Structure diagram for shared_ptr
The following are the class diagrams for shared_ptr and weak_ptr extracted from boost source code.




We first ignore the weak_ptr part in the dotted box. The top-level shared_ptr is the class that the user USES directly. It provides shared_ptr construction, copy, reset function, de-reference, compare, implicit conversion to bool, and so on. It contains a pointer to the managed object, which is used for de-referencing, and a shared_count object, which is used to manipulate the reference count.

However, the shared_count class is not yet a reference count class, it just contains a pointer to the reference count class sp_counted_base, which is a function of encapsulating the sp_counted_base operation. The operations to create, copy, and delete the shared_count object include operations to increase and decrease the reference count of sp_counted_base.

Finally, the sp_counted_base class stores the reference count and provides lock-free protection for the reference count field. It also contains a pointer to the managed object to delete the managed object. Sp_counted_base has three derived classes that handle user-specified Deleter and Allocator cases:

1. Sp_counted_impl_p: the user did not specify Deleter and Allocator

2. Sp_counted_impl_pd: the user specified Deleter, not Allocator

3. Sp_counted_impl_pda: the user specifies Deleter and Allocator

When the first shared_ptr object of pointer P is created, the child object shared_count is created at the same time, and shared_count creates a specific sp_counted_base derived class object X based on the parameters provided by the user. All subsequent shared_ptr objects created to manage P point to this unique X.

Then look at the weak_ptr inside the dotted box. Weak_ptr and shared_ptr are basically the same, except that weak_ptr contains weak_count subobjects, but both weak_count and shared_count also point to sp_counted_base.

If the above text is not clear enough, the following code explains the problem.


shared_ptr<SomeObject> SP1(new SomeObject());
shared_ptr<SomeObject> SP2=SP1;
weak_ptr<SomeObject> WP1=SP1;

After executing the above code, the following object instances are created in memory, where the red arrow represents the pointer to the reference count object and the black arrow represents the pointer to the managed object.




It is clear from the above that SP1, SP2, and WP1 point to the same sp_counted_impl_p object, which holds the reference count and is the memory area where SP1, SP2, and WP1 work together. Multiple threads concurrently modify SP1, SP2, and WP1, and only sp_counted_impl_p objects are concurrently modified, so the thread safety of sp_counted_impl_p is a key issue for shared_ptr and weak_ptr. The thread safety of sp_counted_impl_p is implemented in its base class sp_counted_base. Next, I'll focus on the code for sp_counted_base.

The reference counting class sp_counted_base
Fortunately, sp_counted_base has a small amount of code, which is listed below with comments.


class sp_counted_base
{
private:
     //forbidding
    sp_counted_base( sp_counted_base const & );
    sp_counted_base & operator= ( sp_counted_baseconst & );

     //From the number of
    long use_count_;  
     //The number of weak_ptr is +1
    long weak_count_;      

public:
     //The only constructor, notice that I set both counts to 1
    sp_counted_base(): use_count_( 1 ), weak_count_( 1 ){    }

     //Virtual base class, so it can be used as a base class
    virtual ~sp_counted_base(){    }

     //Subclasses need to be overridden and delete the managed object with operator delete or Deleter
    virtual void dispose() = 0;

     //Subclasses can be overridden to remove the current object with Allocator and so on
    virtual void destroy(){
        delete this;
    }

    virtual void * get_deleter( sp_typeinfo const & ti ) = 0;

     //This function is used when copying shared_count from shared_count
     //Since there is A shared_count as the source, which is called A, as long as A is not released,
     //Use_count_ will not be released () as 1 by another thread.
     //In addition, if one thread USES A as A copy source and another thread releases A, the execution result is undefined.
     void add_ref_copy(){
        _InterlockedIncrement( &use_count_ );
    }

     //This function is used to construct shared_count based on weak_count
     //This is to avoid increasing the reference count through weak_count,
     //The other thread calls the release function, zeros use_count_ and releases the object it points to
    bool add_ref_lock(){
        for( ;; )
        {
            long tmp = static_cast< long const volatile& >( use_count_ );
            if( tmp == 0 ) return false;

            if( _InterlockedCompareExchange( &use_count_, tmp + 1, tmp ) == tmp )return true;
        }
    }

    void release(){
        if( _InterlockedDecrement( &use_count_ ) == 0 )
        {
              //When use_count_ goes from 1 to 0,
              //1. Release the object
              //Perform a decrement operation on weak_count_. This is because weak_count starts at 1 when initialized (use_count_ goes from 0 to 1)
            dispose();
            weak_release();
        }
    }

    void weak_add_ref(){
        _InterlockedIncrement( &weak_count_ );
    }

     //Decreasing weak_count_; And when weak_count is 0, remove yourself
    void weak_release(){
        if( _InterlockedDecrement( &weak_count_ ) == 0 )
        {
            destroy();
        }
    }

     //Returns a reference count. Note that if the user does not add additional locks, the reference count may be modified by another thread at the same time.
    long use_count() const{
        return static_cast<long const volatile &>( use_count_ );
    }
};

The comments in the code already illustrate some of the problems, but to repeat, the use_count_ field is equal to the number of current shared_ptr objects, and the weak_count_ field is equal to the number of current weak_ptr objects plus 1.

First, the weak_ptr case is not considered. Based on the code analysis of the shared_ptr class (the code is not listed, but it's easy to find), all copying between shared_ptr is done by calling add_ref_copy and release. Suppose two threads operate on SP1 and SP2 respectively. The operation process is nothing more than the following three cases:

1. Both SP1 and SP2 increment the reference count so that add_ref_copy is called concurrently, i.e., both _InterlockedIncrement (&use_count_) execute concurrently, which is thread-safe.

2. Both SP1 and SP2 decrement the reference count so that release is called concurrently, which is _InterlockedDecrement(&use_count_) executed concurrently, which is also thread-safe. It's just that the later executing thread is responsible for deleting the object.

3.   SP1 increments the reference count, calling add_ref_copy; SP2 decrements the reference count and calls release. Because of SP1, the SP2 release operation will never cause use_count_ to go to zero, which means that the body of the if statement in the release will never be executed. Thus, this case simplifies to concurrent execution of _InterlockedIncrement (&use_count_) and _InterlockedDecrement(&use_count_), still thread-safe.

Then consider weak_ptr. If it is an operation between weak_ptr, or to construct weak_ptr from shared_ptr, there is no operation involving use_count_, only weak_add_ref and weak_release are called to operate on weak_count_. As in the above analysis, _InterlockedIncrement and _InterlockedDecrement guarantee the thread safety of weak_add_ref and weak_release concurrent operations. However, if there is an operation to construct shared_ptr from weak_ptr, it is necessary to consider that the managed object has been released by other threads during the construction of weak_ptr. If the construction of shared_ptr from weak_ptr is still done through the add_ref_copy function, the following error conditions may occur:


 

Thread 1, creates shared_ptr from weak_ptr

Thread 2, releases the only existing shared_ptr

1

Determine that use_count_ is greater than 0 and wait for add_ref_copy

 

2

 

Call release, use_count--. Finds that use_count is 0 and deletes the managed object

3

Start executing add_ref_copy, causing use_count to increment.

An error occurred,use_count==1, but the object has been deleted

 


We naturally think, thread 1 after the end of the third line, once again to determine whether use_count is 1, if it is 1, the object has been deleted, it is ok to judge failure. It doesn't work. Here's a counter example.

 

Thread 1, creates shared_ptr from weak_ptr

Thread 2, releases the only existing shared_ptr

Thread 3, create shared_ptr from weak_ptr

1

Determine that use_count_ is greater than 0 and wait for add_ref_copy

 

 

2

 

 

Determine that use_count_ is greater than 0 and wait for add_ref_copy

3

 

Call release, use_count--. Finds that use_count is 0 and deletes the managed object

 

4

Start executing add_ref_copy, causing use_count to increment.

 

 

5

 

 

Execute add_ref_copy, causing use_count to increment.

6

Find use_count_! = 1, judge the execution is successful.

An error occurred,use_count==2, but the object has been deleted

 

Find use_count_! = 1, judge the execution is successful.

An error occurred,use_count==2, but the object has been deleted


In fact, boost constructs shared_ptr from weak_ptr not by calling add_ref_copy, but by calling the add_ref_lock function. Add_ref_lock is typical code that modifies a Shared variable without a lock. Let's copy the code again and add a proof comment.


    bool add_ref_lock(){
        for( ;; )
        {
            //The first step is to record use_count_
            long tmp = static_cast< long const volatile& >( use_count_ );
            //Second, if another thread has already cleared 0, then the managed object has been or will be released, returning false
            if( tmp == 0 ) return false;
            //Step 3: if the if condition succeeds,
         //Before changing use_count_,use_count is still TMP, greater than 0
            //That is, use_count_ never goes to zero between step one and step three.
            //This is because once use_count goes to 0, it is impossible to sum it up again to be greater than 0
            //Therefore, between step 1 and step 3, the managed object cannot be released and returns true.
            if( _InterlockedCompareExchange( &use_count_, tmp + 1, tmp ) == tmp )return true;
        }
    }

In the above comment, an unproven conclusion is used, "once use_count becomes zero, it is impossible to sum to greater than zero again." The following four columns prove it.

1. Use_count_ is a private object of the sp_counted_base class, and sp_counted_base has no friend function, so use_count_ is not modified by code outside the object.

2. The member function add_ref_copy increments use_count_, but all calls to the add_ref_copy function are made through a shared_ptr object. Since the shared_ptr object exists, use_count must not be 0 before it is incremented.

3. The member function add_ref_lock increments use_count_, but as the add_ref_lock code shows, TMP is greater than 0 at the third step, so add_ref_lock does not increments use_count_ from 0 to 1

4. Other member functions never increment use_count_

At this point, we can rest assured that the incremental reference count is successful as long as add_ref_lock returns true. Therefore, the behavior of constructing shared_ptr from weak_ptr is also fully determined. Either add_ref_lock returns true and the construction succeeds, or add_ref_lock returns false and the construction fails.

In summary, multiple threads are thread-safe by concurrently modifying the same reference count object sp_counted_base with different shared_ptr or weak_ptr objects. The sp_counted_base object is the Shared memory area where these smart Pointers operate only, so the end result is thread-safe.

Other operating
As we discussed earlier, different shared_ptr objects can be modified by multiple threads at the same time. What about the other issues, can the same shared_ptr object be modified for multiple threads at the same time? It is important to note that all of the previous synchronization was for reference counting class sp_counted_base, and that shared_ptr itself does not have any synchronization protection. Let's look at the following non-thread-safe example from the boost documentation


// thread A
p3.reset(new int(1));
// thread B
p3.reset(new int(2)); // undefined, multiple writes

Here is the code for the shared_ptr class

template<class Y>
void reset(Y * p)
{
     this_type(p).swap(*this);
}
void swap(shared_ptr<T> & other)
{
     std::swap(px, other.px);
     pn.swap(other.pn);
}

As you can see, reset performed two operations to modify A member variable, and the results of thread A and thread B might be illegal.

But like the semantics of built-in objects, boost provides several atomic functions that allow concurrent modification of the same shared_ptr object. This includes atomic_store, atomic_exchange, atomic_compare_exchange, and so on. The following is the implementation of the code, no more detailed analysis.


template<class T>
void atomic_store( shared_ptr<T> * p, shared_ptr<T> r ){
    boost::detail::spinlock_pool<2>::scoped_lock lock( p );
    p->swap( r );
}

template<class T>
shared_ptr<T> atomic_exchange( shared_ptr<T> * p, shared_ptr<T> r ){
    boost::detail::spinlock & sp = boost::detail::spinlock_pool<2>::spinlock_for( p );

    sp.lock();
    p->swap( r );
    sp.unlock();

    return r;
}

template<class T>
bool atomic_compare_exchange( shared_ptr<T> * p, shared_ptr<T> * v, shared_ptr<T> w ){

    boost::detail::spinlock & sp = boost::detail::spinlock_pool<2>::spinlock_for( p );
    sp.lock();
    if( p->_internal_equiv( *v ) ){
        p->swap( w );
        sp.unlock();
        return true;
    }
    else{
        shared_ptr<T> tmp( *p );
        sp.unlock();
        tmp.swap( *v );
        return false;
    }
}

conclusion
As the boost documentation claims, boost provides shared_ptr with the same level of thread safety as the built-in type. This includes:

1. The same shared_ptr object can be read by multiple threads simultaneously.

2. Different shared_ptr objects can be modified by multiple threads at the same time.

3. The same shared_ptr object cannot be directly modified by multiple threads, but can be done by atomic functions.

This is also true if you replace "shared_ptr" in the above statement with "built-in type".

Finally, while I was sorting through this stuff, I also found that some of the key points were hard to articulate, due to the fact that thread safety itself is harder to prove rigorously. It is recommended that you read the full code for shared_ptr if you want to fully understand it.


Related articles: