Why data synchronization in Java multithreaded programming

2020-04-01 02:50:11
OfStack

There are two types of variables in Java: local variables and class variables. Local variables are variables defined within a method, such as those defined in the run method. For these variables, there is no problem of sharing between threads. Therefore, they do not need to synchronize data. Class variables are variables defined in a class and are scoped to the entire class. Such variables can be Shared by multiple threads. Therefore, we need to synchronize the data of such variables.
Data synchronization means that at the same time, only one thread can access the synchronized class variables. After the current thread accesses these variables, other threads can continue to access them. Access here refers to access with write operations, if all the threads accessing the class variables are read operations, generally do not need data synchronization. So what happens if you don't synchronize data with Shared class variables? Let's first see what happens with the following code:


package test;
public class MyThread extends Thread
{
    public static int n = 0;
    public void run()
    {
        int m = n;
        yield();
        m++;
        n = m;
    }
    public static void main(String[] args) throws Exception
    {
        MyThread myThread = new MyThread ();
        Thread threads[] = new Thread[100];
        for (int i = 0; i < threads.length; i++)
            threads[i] = new Thread(myThread);
        for (int i = 0; i < threads.length; i++)
            threads[i].start();
        for (int i = 0; i < threads.length; i++)
            threads[i].join();
        System.out.println("n = " + MyThread.n);
    }
}

The possible results of executing the above code are as follows:


n = 59

This may surprise many readers. This program starts 100 threads, and then each thread adds the static variable n plus 1. Finally, after the 100 threads are run using the join method, the n value is output. Normally, it would be n is equal to 100. But the result is less than 100.
The culprit is what we often refer to as "dirty data". The yield() statement in the run method is the one that produces the "dirty data". (it is possible to produce the "dirty data" without the yield statement, but it is not so obvious. The yield method suspends a thread, meaning that the thread calling the yield method temporarily abandons CPU resources, giving the CPU a chance to execute another thread. To illustrate how this program generates "dirty data," let's assume that only two threads are created: thread1 and thread2. Because thread1's start method is called first, thread1's run method is typically run first. When thread1's run method runs to the first line (int m = n;) , assign the value of n to m. Thread1 stops execution when the yield method on the second line is executed, and when thread1 pauses, thread2 acquires the CPU resources and starts running (until thread2 was ready), when thread2 reaches the first line (int m = n;). Since n is still 0 when thread1 is executed to yield, the value obtained by m in thread2 is also 0. So you end up with m for both thread1 and thread2 getting 0. After they execute the yield method, they both start at 0 and add 1, so whoever executes first ends up with a value of 1, except that this n is assigned by thread1 and thread2. One might ask, if you only had n++, would you generate "dirty data"? The answer is yes. So n++ is just a statement, how do you give the CPU to another thread during execution? In fact, this is only a superficial phenomenon, and n++ is not a language after it is compiled into an intermediate language (also known as bytecode) by the Java compiler. Let's take a look at what Java intermediate language the following Java code will compile into.


public void run()
{
    n++;
}

The compiled intermediate language code


public void run()
{
 aload_0
 dup 
 getfield
 iconst_1  
 iadd
 putfield 
 return 
}

You can see that there is only n++ in the run method, but after compilation, there are seven intermediate language statements. We don't need to know what these statements do, just look at lines 005, 007, and 008. So in line 005 is getfield, which is going to get some value, because there's only one n here, so it's going to get the value of n. And iadd in line 007, it's not hard to guess that I'm going to add 1 to this n value. The meaning of the putfield on line 008, which I think you might have guessed, is responsible for updating this plus 1 n back to the class variable n. Now, one of the things you might be wondering is, when you do n++, you just add n to 1, so why bother. There is actually a problem with the Java memory model.
The memory model of Java is divided into primary storage and working storage. The main store holds all the instances in Java. That is, after we use new to create an object, the object and its internal methods, variables, and so on are stored in this area, and the n in the MyThread class is stored in this area. The main storage area can be Shared by all threads. The working store is the thread stack we talked about earlier, and it holds the variables defined in the run method and the methods called by the run method, which are called method variables. When a thread wants to modify variables in the main storage area, instead of directly modifying them, it copies them to the working storage area of the current thread and overwrites the value of the variable in the main storage area.
Once you understand the memory model of Java, it's not hard to understand why n++ is not an atomic operation either. It must go through the process of copying, adding one, and overwriting. This process is similar to the one simulated in the MyThread class. As you can imagine, if thread1 was interrupted for some reason while executing to getfield, something similar to the result of the MyThread class would happen. To solve this problem completely, you have to synchronize n in a way that only one thread can operate on n at a time, which is also called atomic operation on n.