Why Java volatile + + is not a detailed explanation of atomicity

2021-08-12 02:44:26
OfStack

Problem

When discussing atomicity operations, we often hear a statement that reading and writing any single volatile variable is atomicity, except for volatile + +.

So the question is: Why isn't volatile + + atomic?

Answer

Because it is actually a conforming operation composed of three operations.

First get the value of the volatile variable Add 1 to the value of this variable Write the value of the volatile variable to the corresponding main memory address

A very simple example:

If both threads get a=1 in the read stage of volatile, then the subsequent self-increment on the CPU core corresponding to the thread will certainly get a=2, and the last two write operations will end up with a=2 no matter how atomicity is guaranteed. Each operation itself is fine, but when taken together, it is an unsafe operation of one thread as a whole: two self-increasing operations occur, but the final result is not 3.

Analysis

Combined with the concept of memory barrier, we can deeply understand the read and write operations of volatile:

Step 1: Read

After the instructions in Step 1, two memory barriers are added:

Insert an LoadLoad barrier after an Volatile read operation to prevent the preceding Volatile read from being reordered with the following normal read Insert an LoadStore barrier after an Volatile read operation to prevent the preceding Volatile read from being reordered with the following normal write

Therefore, the first instruction and its subsequent ordinary read and write operations will be guaranteed to be undisturbed without reordering. It is usually read in memory.

Then the question comes again, why do you usually read in memory?

In fact, if you want to talk about this problem in detail, you can be very detailed. There are probably two key points:

Cache Failure Mechanism of volatile Write Operation The last CPU that writes to the volatile variable keeps the latest value in its corresponding cache, so it does not need to be retrieved from main memory

See the analysis in Step 3 below for details.

Step 2: Self-increasing

This step is nothing special, just in the CPU's own cache (registers, L1-L3 Cache). Cache and memory interactions are not involved.

Step 3: Write

volatile writing is a key point.

According to JMM's semantic specification for volatile variable types, volatile will add LOCK prefix instructions when writing variables after compilation. This LOCK prefix instruction has the following functions in the environment of multi-core processor:

Notifies CPU to write the data of the current processor cache line back to the system main memory This write-back operation will invalidate data cached by other CPU with this memory address

In addition, the memory barrier plays a great role in the write operation of volatile to ensure that the above two points can be realized:

Insert an StoreStore barrier before an Volatile write operation to prevent other previous writes from being reordered with this Volatile write Insert an StoreLoad barrier after an Volatile write to prevent this Volatile write from being reordered with subsequent reads

Extension

So what is the solution to solve the atomicity of composite operations such as volatile + +? In fact, there are many schemes, and two typical ones are provided here:

Use the synchronized keyword Use AtomicInteger/AtomicLong atomic types

synchronized Keyword

synchronized is a relatively primitive means of synchronization. It is essentially an exclusive, reentrant lock. When a thread tries to get it, it may be blocked, so there are some performance problems in high concurrency scenarios.

In some scenarios, using the synchronized keyword and volatile is equivalent:

The value of a variable is written without relying on the current value of the variable, or it can be guaranteed that only 1 thread can modify the value of the variable. The value of the variable written does not depend on the participation of other variables. Variable values cannot be locked for other reasons when reading them.

Locking guarantees both visibility and atomicity, whereas volatile only guarantees the visibility of variable values.

AtomicInteger/AtomicLong

Such atomic types are lighter than locks, such as AtomicInteger/AtomicLong for integer variables and long integer variables, respectively.

In their implementations, volatile int/volatile long, which are actually used separately, hold the true values. Therefore, volatile is also used to ensure the atomicity of reading and writing for a single variable.

On this basis, they provide atomic self-increasing and self-decreasing operations. For example, the incrementAndGet method has the advantage over synchronized that they do not cause thread suspension and rescheduling, because the CAS non-blocking algorithm is used inside it.

What is CAS

The so-called CAS is CompareAndSet. Literally translated, it means comparing and setting. This operation needs to accept three parameters:

Memory location Old expected value New value

This operation is done by seeing that the value symbol of the specified memory location does not match the old expected value, and replacing it with the new value if it does. It corresponds to an atomic instruction provided by the processor-CMPXCHG.

For example, the self-increasing operation of AtomicLong:


public final long incrementAndGet() {
 for (;;) {
  long current = get(); // Step 1
  long next = current + 1; // Step 2
  if (compareAndSet(current, next)) // Step 3
   return next;
 }
}

public final boolean compareAndSet(long expect, long update) {
 return unsafe.compareAndSwapLong(this, valueOffset, expect, update);
}

We consider two threads, T1 and T2, executing at the same time at Step 1 above, and both get an current value of 1. Then, after passing Step 2, current is set to 2 in both threads.

Then, I came to Step 3. Assuming that the thread T1 executes first, the setting rules of CompareAndSet are met at this time, so the value corresponding to the memory location is set to 2, and the thread T1 is set successfully. When the thread T2 executes, because it expects current to be 1, but it has actually become 2, the execution of CompareAndSet is unsuccessful, and it enters the next round of for loop. At this time, the latest current value is 2. If there is no other thread infection, it can pass when executing CompareAndSet again, and the current value is updated to 3.

Therefore, it is not difficult to find that the work of CAS mainly depends on two points:

Infinite loop, which needs to consume part of CPU performance CPU atomic instruction CompareAndSet

Although it costs 1 CPU Cycle, it still has advantages over locking, such as avoiding context switching and scheduling caused by thread blocking. The magnitude of these two types of operations is obviously different, and CAS is lighter.

Summarize

We say that the read/write operation for volatile variable is atomic. Because from the memory barrier point of view, there is really no doubt about the simple read and write operation of volatile variables.

Since a self-increasing CPU internal operation is doped in it, this recombination operation no longer retains atomicity.

Then, we discuss how to guarantee the atomicity of volatile + + operations, such as using synchronized or AtomicInteger/AtomicLong atomic classes.