The GC garbage collection mechanism of Java is analyzed from the perspective of memory management of JVM

  • 2020-04-01 04:21:13
  • OfStack

A good Java programmer must understand how GC works, how to optimize the performance of GC, and how to interact with GC in a limited way. Because some applications require high performance, such as embedded systems, real-time systems, etc., the performance of the entire application can only be improved by comprehensively improving the management efficiency of memory. This article first briefly introduces the working principle of GC, then discusses several key issues of GC in depth, and finally puts forward some Java programming Suggestions to improve the performance of Java programs from the perspective of GC.
      The fundamentals of GC
      Memory management in Java is really the management of objects, including the allocation and release of objects.
      For programmers, assign objects using the new keyword; When an object is released, we call it \" unreachable \". The GC is responsible for recovering the memory space of all \" unreachable \" objects as long as all references to the object are assigned to null so that the program can no longer access the object.
      For GC, when the programmer creates an object, the GC starts monitoring the address, size, and usage of the object. In general, GC records and manages all objects in the heap in a directed graph (see resources 1). This is the way to determine which objects are \" reachable \" and which are \" unreachable \". When the GC determines that some objects are \" unreachable \", it is the GC's responsibility to reclaim this memory space. However, in order to ensure that GC can be implemented on different platforms, the Java specification does not impose strict rules on many behaviors of GC. For example, there are no clear rules on important issues such as what type of recycling algorithm to use and when to recycle. Therefore, implementers of different JVMS tend to have different implementation algorithms. This also creates line - to - line uncertainty for Java programmers. This article explores several issues related to GC work in an effort to reduce the negative impact of this uncertainty on Java programs.
      Incremental GC
      GC is typically implemented by one or a group of processes in the JVM, and it itself takes up the same heap space as the user program and the CPU at run time. When the GC process runs, the application stops running. Therefore, when the GC runs for a long time, the user can feel the pause in the Java program. On the other hand, if the GC runs for a short time, the object recovery rate may be too low, which means that there are many objects that should be recycled that have not been recycled and still take up a lot of memory. Therefore, when designing a GC, you must make a trade-off between pause time and recovery. A good GC implementation allows the user to define the Settings he or she needs. For example, some devices have limited memory and are sensitive to the amount of memory used. Other real-time online games do not allow long outages. Incremental GC is through a certain recovery algorithm, a long interrupt, divided into many small interrupts, through this way to reduce the impact of GC on the user program. While incremental GC may not be as efficient as regular GC in terms of overall performance, it can reduce the maximum pause time of the program.
      The Sun JDK provides a HotSpot JVM that supports incremental GC. The default GC mode for the HotSpot JVM is to not use incremental GC, and in order to start incremental GC, we must add the -xincgc parameter when we run the Java program. The implementation of the HotSpot JVM's incremental GC is implemented using the Train GC algorithm. The basic idea is to group all the objects in the heap according to their creation and usage (layering), put the frequently used and relevant objects in a team, and adjust the group as the program runs. When the GC runs, it always recycles the oldest (and least recently accessed) objects first, and if the entire group is recyclable, the GC recycles the entire group. In this way, only a certain percentage of unreachable objects are recovered per GC run to ensure smooth operation of the program.
      Detail the finalize function
      Finalize is a method in the Object class that has an access modifier of protected, which is easily accessible to the user class because all classes are subclasses of Object. Because finalize functions do not automatically implement chained calls, we have to implement them manually, so the last statement from a finalize function is usually super.finalize (). In this way, we can implement a call to finalize from the bottom up, freeing our own resources and then those of the parent class.
      According to the Java language specification, the JVM guarantees that the object is not reachable until the finalize function is called, but the JVM does not guarantee that the function will be called. In addition, the specification guarantees that the finalize function will run at most once.
      Many Java beginners will think of this method as similar to a destructor in C++, in which many objects and resource releases are placed. Actually, this is not a very good way. There are three reasons. First, in order to be able to support the finalize function, the GC does a lot of additional work on the object that overrides it. Second, after the finalize run completes, the object may become reachable, and the GC will check again to see if the object is reachable. Therefore, using finalize can degrade the performance of GC. Third, since the timing of a GC call to finalize is uncertain, it is also uncertain that resources can be released in this way.
      Finalize is typically used for the release of some hard-to-control and very important resources, such as I/O operations, data connections. The release of these resources is critical to the overall application. In this case, the programmer should focus on managing (including releasing) these resources through the program itself, supplemented by a finalize function that releases resources, creating a double-play management mechanism, rather than relying solely on finalize to release resources.
      Here is an example of how a finalize function can still be reachable after it is called, and how an object's finalize can only be run once.


  class MyObject{
     Test main; //Record the Test object, which is used to restore accessibility in finalize
     public MyObject(Test t)
     {
     main=t; //Save the Test object
     }
     protected void finalize()
     {
     main.ref=this;//Restores the object and makes it accessible
     System.out.println("This is finalize");//Finalize is used to test it only once
     }
    }
    class Test {
     MyObject ref;
     public static void main(String[] args) {
     Test test=new Test();
     test.ref=new MyObject(test);
     test.ref=null; //The MyObject object is an unreachable object, and the finalize will be called
     System.gc();
     if (test.ref!=null) System.out.println("My Object Still alive ");
     }
    }

      Operation results:


  This is finalize

MyObject is still alive
 
In this example, it is important to note that while the MyObject object becomes reachable ina finalize, the next time it is reclaimed, the finalize is no longer called because the finalize function is called at most once.

How does the program interact with GC
Java2 enhances memory management by adding a java.lang.ref package that defines three reference classes. These three reference classes are SoftReference, WeakReference, and PhantomReference. By using these reference classes, programmers can interact with GC to some extent to improve GC's efficiency. The reference strength of these reference classes is between reachable and unreachable objects.
It is also very easy to create a Reference object. For example, if you need to create a Soft Reference object, first create an object and use the normal Reference method (reachable object). Then create a SoftReference to reference the object; Finally, the normal Reference is set to null. In this way, the object has only one Soft Reference Reference. Also, we call this object a Soft Reference object.
The main feature of Soft Reference is its strong Reference function. This type of memory is recycled only when there is not enough memory, so it is usually not recycled when there is. In addition, these reference objects can be guaranteed to be set to null before Java throws OutOfMemory exception. It can be used to implement the Cache function of some commonly used images, to ensure the maximum use of memory without causing OutOfMemory.


//Apply an image object
  Image image=new Image();//Create an Image object
 ... 
  //Use the image
 ... 
  //After using the image, set it to the soft reference type and release the strong reference.
  SoftReference sr=new SoftReference(image);
  image=null;
    ... 
   //Next time
   if (sr!=null) image=sr.get();
   else{
   //Since the GC has freed the image due to low memory, it needs to be reloaded.
   image=new Image();
  sr=new SoftReference(image);
  }

 
The biggest difference between an Weak reference object and a Soft reference object is that the GC needs to check whether the Soft reference object is recovered or not by an algorithm while the GC always recycles the Weak reference object. An Weak reference object is more easily and quickly collected by the GC. While the GC is bound to recover Weak objects at run time, complex clusters of Weak objects often require several GC runs to complete. An Weak reference object is often used in the Map structure to refer to an object with a large amount of data, and the GC can quickly reclaim the object space once the strong reference to that object is null.
Phantom references are less useful and are used to assist with finalize functions. Phantom objects refer to objects that have completed the finalize function and are unreachable, but have not yet been reclaimed by the GC. This object can assist finalize with some of the later collections, and we increase the flexibility of the resource collection mechanism by overriding the clear () method of Reference.
Some Suggestions for Java coding
Depending on how GC works, there are a few tricks and ways to make GC run more efficiently and more application-friendly. Here are some programming Suggestions.
1. The most basic advice is to release references to useless objects as early as possible. When most programmers use temporary variables, they automatically set the reference variables to null after exiting the active field (scope). When we use this method, we must pay special attention to some complex object graph, such as array, queue, tree, graph, and so on. For such objects, GC collection is generally inefficient. If the program allows, assign null references to objects that are not in use as early as possible. This can speed up GC work. [Page]
2. Minimize finalize functions. Finalize functions are an opportunity that Java gives programmers to release objects or resources. However, it increases the workload of the GC, so try to minimize finalize collection.
3. If you need to use frequently used images, you can use soft application type. It can save images in memory as much as possible for the program to call without causing OutOfMemory.
4. Note the collection data types, including arrays, trees, graphs, linked lists, and other data structures that are more complex for GC to recycle. Also, notice some global variables, as well as some static variables. These variables tend to cause dangling reference, causing memory waste.
5. When a program has a certain waiting time, the programmer can manually execute system.gc () to notify the gc to run, but the Java language specification does not guarantee that the gc will execute. Using incremental GC can reduce the pause time of a Java program.


Related articles: