Java garbage collection mechanism details and example code

  • 2020-06-07 04:28:28
  • OfStack

Details of Java garbage collection mechanism

At first glance, garbage collection should do exactly what it says it does -- find and remove garbage. In fact, the opposite is true. Garbage collection keeps track of all objects that are still in use, and then marks the remaining objects as garbage. With this in mind, let's take a closer look at how this automated memory collection called "garbage collection" works in JVM.

Manual memory management

Before we get into modern versions of garbage collection, let's briefly review the days when you had to manually explicitly allocate and free memory. If you forget to free the memory, it will not be reusable. This memory is occupied but not used. This scenario is called a memory leak.

Here is a simple example of manual memory management written with C:


int send_request() {
  size_t n = read_size();
  int *elements = malloc(n * sizeof(int));

  if(read_elements(n, elements) < n) {
    // elements not freed!
    return -1;
  }

  //  ... 

  free(elements)
  return 0;
}

As you can see, it's easy to forget to free up memory. Memory leaks used to be a very common problem. You can only fight them by constantly fixing your own code. Therefore, there needs to be a more elegant way to automatically release unwanted memory in order to reduce the possibility of human error. This automated process is also known as garbage collection (GC for short).

Smart Pointers

An early implementation of automatic garbage collection was reference counting. You know how many times each object is referenced, and when the counter returns to zero, the object can be safely recycled. The C++ Shared pointer is a famous example:


int send_request() {
  size_t n = read_size();
  stared_ptr<vector<int>> elements 
       = make_shared<vector<int>&gt();

  if(read_elements(n, elements) < n) {
    return -1;
  }

  return 0;
}

sharedptr, which we use, keeps track of how many times this object is referenced. If you pass it to someone else you add 1 to the count, and when it's out of scope it decreases 1. 1 Once the count is 0, sharedptr automatically deletes the underlying vector. This is just an example, of course, because as some readers have pointed out, this is not likely to happen in real life, but it's enough for a demonstration.

Automatic memory management

In the C++ code above, we also have to explicitly state that we need to use memory management. What if all objects adopt this mechanism? That would be so convenient that developers wouldn't have to worry about cleaning up memory. The runtime automatically knows which memory is no longer in use and releases it. In other words, it automatically recycles the garbage. The first generation of garbage collectors was introduced by Lisp in 1959, and the technology has been evolving ever since.

Reference counting

The idea we just demonstrated with C++ Shared Pointers can be applied to all objects. Many languages such as Perl, Python and PHP use this approach. This can be easily illustrated by the following figure:

The green cloud represents the objects that are still in use in the program. Technically, this is like a local variable in a method being executed, or a static variable. It may vary from programming language to language, so that's not our focus.

The blue circles represent objects in memory, and you can see how many objects refer to them. The object of the grey circle is no longer referenced by anyone. Therefore, they are garbage objects that can be cleaned up by the garbage collector.

It looks good, doesn't it? Yes, but there is one major drawback. It is easy to have isolated rings in which the objects are not in any domain, but refer to each other so that the number of references is not zero. Here is an example:

As you can see, the red part is actually the garbage object that the application no longer USES. Memory leaks can occur due to a defect in reference counting.

There are several ways to solve this problem, such as using special "weak" references, or using a special algorithm to reclaim circular references. The aforementioned Perl,Python, and PHP use similar methods to recycle circular references, but this is beyond the scope of this article. We are going to elaborate on the approach taken by JVM.

Tags deleted

First, JVM's definition of object reachability needs to be clear. Rather than being vague with the green cloud, it has a very clear and specific definition of the garbage collection root object (Garbage Collection Roots) :

A local variable Active threads Static field JNI reference Others (discussed later)

JVM records all reachable (live) objects by marking the deletion algorithm, while ensuring that the memory of unreachable objects can be reused. This involves two steps:

A tag is a traversal of all reachable objects and a record of their information in local memory Deletion ensures that the memory address of an unreachable object can be used in the next memory allocation.

Different GC algorithms in JVM, such as Parallel Scavenge, Parallel Mark+Copy and CMS, are different implementations of this algorithm, only slightly different in each stage. Conceptually, they still correspond to the two steps mentioned above.

The most important thing about this implementation is that there are no more leaky object rings:

The disadvantage is that the threads of the application need to be suspended to complete the collection, and you cannot count if the reference 1 is changing. The case where the app is suspended so that JVM can clean up the house is also known as Stop The World pause(STW). There are many possibilities for this pause to be triggered, but garbage collection is probably the most common one.

Thank you for reading, I hope to help you, thank you for your support to this site!


Related articles: