In depth multithreading: memory fence and volatile keyword usage analysis

2020-05-10 18:46:50
OfStack

We said earlier that in simple cases like assigning or incrementing a field, we need to synchronize the thread,
While lock can meet our needs, a contended lock 1 will certainly block and then incur the overhead of thread context switching and scheduling, which is intolerable in places where high concurrency and performance are critical.
.net framework provides a non-blocking synchronization construct that improves performance for simple operations without even blocking, pausing, and waiting threads.

Memory Barriers and Volatility (memory fence and volatile fields)
Consider the following code:


int _answer;
        bool _complete;
        void A()
        {
            _answer = 123;
            _complete = true;
        }
        void B()
        {
            if (_complete)
                Console.WriteLine(_answer);
        }

If methods A and B are executed concurrently in different threads, can method B output "0"?

The answer is "yes" for the following reasons:
The compiler, clr or cpu may reorder the program's instructions for performance, for example, adjusting the order of two lines of code in the method A.
The compiler, clr or cpu may use a caching strategy for the assignment of variables so that these variables are not immediately visible to other variables, such as the variable assignment in method A, which is not immediately flushed into memory, and the variable B sees a variable that is not the latest value.

C# and the runtime take great care to ensure that these optimizations do not affect normal single-threaded code and code that is locked in a multithreaded environment.
In addition, you must display the memory barrier (Memory fences) to limit the impact of instruction reordering and read/write caching on the program.

Full fences:

The easiest way to complete a fence is to use the Thread.MemoryBarrier method.

Here's how msdn explains it:
Thread.MemoryBarrier: synchronize memory access as follows: when reordering instructions, the processor executing the current thread cannot access memory after the MemoryBarrier call and then before the MemoryBarrier call.
According to my personal understanding, after writing the data, call MemoryBarrier, and the data will be refreshed immediately. In addition, calling MemoryBarrier before reading the data can ensure that the data read is up to date, and the processor carefully handles the optimization of MemoryBarrier.


int _answer;
        bool _complete;
        void A()
        {
            _answer = 123;
            Thread.MemoryBarrier(); // After writing, create a memory fence 
            _complete = true;
            Thread.MemoryBarrier();// After writing, create a memory fence        
       }
        void B()
        {
            Thread.MemoryBarrier();// Before reading, create a memory fence 
            if (_complete)
            {
                Thread.MemoryBarrier();// Before reading, create a memory fence 
                Console.WriteLine(_answer);
            }
        }

A full fence in modern desktop applications, larger than 10 nanoseconds.
All of the following constructs implicitly generate a full fence.

C# Lock statement (Monitor.Enter/Monitor.Exit)
all methods in Interlocked class.
USES thread pools for asynchronous callbacks, including asynchronous delegates, APM callbacks, and Task continuations.
in a signal construction send (Settings) and wait (waiting)

You don't need to use a full fence for every read and write of a variable. Assuming you have three answer fields, we can still use four fences. Such as:


int _answer1, _answer2, _answer3;
        bool _complete;
        void A()
        {
            _answer1 = 1; _answer2 = 2; _answer3 = 3;
            Thread.MemoryBarrier(); // After writing, create a memory fence 
            _complete = true;
            Thread.MemoryBarrier(); // After writing, create a memory fence 
        }
        void B()
        {
            Thread.MemoryBarrier(); // Before reading, create a memory fence 
            if (_complete)
            {
                Thread.MemoryBarrier(); // Before reading, create a memory fence 
                Console.WriteLine(_answer1 + _answer2 + _answer3);
            }
        }

Do we really need lock and the memory fence?
Not using lock or a fence on a Shared writable field is asking for trouble, and there are many topics on msdn about this.
Consider the following code:


public static void Main()
        {
            bool complete = false;
            var t = new Thread(() =>
                {
                    bool toggle = false;
                    while (!complete) toggle = !toggle;
                });
            t.Start();
            Thread.Sleep(1000);
            complete = true;
            t.Join();
        }

If you choose publish mode in Visual Studio to generate the application, then if you run the application directly, it will not stop.
Because the CPU register caches the value of the complete variable. In registers, complete is always false.
You can solve this problem by inserting Thread.MemoryBarrier into the while loop, or by locking when reading complete.

volatile keyword
Adding the volatile keyword to the _complete field also solves this problem.
volatile bool _complete.

The Volatile keyword instructs the compiler to automatically block the read and write fields.
The volatile keyword indicates that a field can be modified by multiple threads executing at the same time. Fields declared as volatile are not subject to compiler optimizations (assuming they are accessed by a single thread). This ensures that the field is rendered with the latest value at all times.

The volatile field can be summarized into the following table:

The first instruction

The second instruction

Can it be exchanged?

Read

Write

No(CLR ensures that write and write operations are not swapped, and even the volatile keyword is not used)

Write

Read

Yes!

Note that the application of the volatile keyword does not guarantee that write and read operations will not be swapped, which can cause puzzling problems. Such as:


volatile int x, y;
        void Test1()
        {
            x = 1;      //Volatile write
            int a = y;  //Volatile Read
        }
        void Test2()
        {
            y = 1;      //Volatile write
            int b = x;  //Volatile Read
        }

If Test1 and Test2 are executed concurrently in different threads, it is possible that the a and b fields both have a value of 0 (although the volatile keyword is applied on x and y)

This is a good example of how to avoid using the volatile keyword, even if you understand the code thoroughly, do other people who work on your code understand it as well? .

Using a full fence or lock in the Test1 and Test2 methods can solve this problem,

Another reason for not using the volatile keyword is performance, because memory pallets are created for each read and write, for example


volatile m_amount
m_amount  = m_amount + m_amount.

The Volatile keyword does not support references to passed parameters, and local variables. In such a situation, you must use

VolatileRead and VolatileWrite methods. For example,


volatile int m_amount;
Boolean success =int32.TryParse( " 123 " ,out m_amount);
// Generate the following warning message: 
//cs0420: right volatile A reference to a field is not considered volatile.

VolatileRead and VolatileWrite

Technically, the static methods VolatileRead and VolatileWrite of the Thread class are used to read 1 variable and 1 function of the volatile keyword.

Their implementation of 1 is inefficient, despite the fact that they all created memory palings. Here is their implementation on the integer type.


public static void VolatileWrite(ref int address, int value)
        {
            Thread.MemoryBarrier(); address = value;
        }
        public static int VolatileRead(ref int address)
        {
            int num = address; Thread.MemoryBarrier(); return num;
        }

You can see that if you call VolatileWrite after calling VolatileRead, no fence will be created in the middle, which again leads to the possibility of changing the order we talked about above after writing.