Several Suggestions for reducing the Java garbage collection overhead

  • 2020-06-03 06:26:32
  • OfStack

What are some tips for keeping GC low overhead?

With the upcoming Java9 release due to 1 more delay, the G1(" Garbage First ") garbage collector will become the default garbage collector for the HotSpot virtual machine. From the serial garbage collector to the CMS collector, JVM has seen many GC implementations, and G1 will be the next generation of its garbage collector.

With the development of garbage collector, each GC generation brought great progress and improvement compared with the previous generation. parallel GC compared to serial GC, it allows the garbage collector to work in a multithreaded manner, taking full advantage of the computing power of multi-core computers. The CMS(" Concurrent ES23en-ES24en ") collector, compared to parallel GC, divides the collection process into several stages, allowing the collection to be done concurrently while the application thread is running, greatly improving the frequent execution of "ES27en-ES28en-ES29en". G1 performs better for JVM, which has a lot of heap memory, and has better predictability and one-level pause procedures.

Tip #1: Predict the capacity of the set

All of the standard Java collections, including custom and extended implementations (such as Trove and Guava for Google), use arrays (either native data types or object-based types) underneath. Because array 1 is immutable once allocated, adding elements to a collection often results in the need to reapply for a new large array to replace the old one.

Even if the size of the collection initialization is not provided, most implementations of the collection try to optimize the processing of the reallocation array and amortize its overhead to the minimum. However, it is best to provide the size when constructing the collection.

Let's analyze the following code as a simple example.


public static List reverse(List & lt; ? extends T & gt; list) {
 
 List result = new ArrayList();
 
 for (int i = list.size() - 1; i & gt; = 0; i--) {
  result.add(list.get(i));
 }
 
 return result;
}

This method allocates a new array, then it with from another list reverse order. This method allocates a new array and fills the array with another element from list, but the number order of the elements is changed.

This approach can be costly in terms of performance, and the point of optimization is to add the element to the new list. With each addition, list needs to ensure that its underlying array has enough space to hold the new element. If there is a free place, then simply store the new element to the next free slot. If not, a new underlying array is allocated, the old array contents are copied into the new array, and the new element is added. This will cause the arrays to be allocated multiple times, and those remaining old arrays will eventually be recycled by GC.

We can avoid this extra allocation by letting the underlying array know how many elements it will store when we construct the collection


public static List reverse(List & lt; ? extends T & gt; list) {
 
 List result = new ArrayList(list.size());
 
 for (int i = list.size() - 1; i & gt; = 0; i--) {
  result.add(list.get(i));
 }
 
 return result;
 
}

The above code specifies enough space for storage via ArrayList's constructor list.size() Element, which completes the allocation at initialization time, meaning that List does not need to allocate memory again during the iteration.

The collection class of Guava goes a step further, allowing you to initialize the collection with an explicit number of expected elements or a predicted value.


List result = Lists.newArrayListWithCapacity(list.size());
List result = Lists.newArrayListWithExpectedSize(list.size());

In the above code, the former is used when we already know exactly how many elements the collection will store, while the latter is allocated in a way that takes into account misestimates.

Tip #2: Processing data streams directly

When dealing with data streams, such as reading data from a file or downloading data from a network, the following code is very common:


byte[] fileData = readFileToByteArray(new File("myfile.txt"));

The resulting byte array may be parsed by XML documents, JSON objects, or protocol buffer messages, as well as one of the common options.

This is unwise when dealing with large files or when the size of the file is unpredictable, as JVM can cause OutOfMemeoryErrors when it cannot allocate a buffer to handle real files.

Even though the size of the data is manageable, when it comes to garbage collection, using the above pattern is still a huge overhead because it allocates a very large chunk of the heap to store the file data.

A better approach is to pass the appropriate InputStream (for example, FileInputStream in this example) directly to the parser, instead of reading the entire file once into a 1-byte array. All major open source libraries provide corresponding API to directly accept 1 input stream for processing, such as:


FileInputStream fis = new FileInputStream(fileName);
MyProtoBufMessage msg = MyProtoBufMessage.parseFrom(fis);

Tip #3: Use immutable objects

Immutability has too many advantages. I don't even need to say anything. However, there is one advantage to garbage collection that should be of concern.

Properties of an immutable object cannot be modified after the object is created (in this case, properties that reference data types), such as:


public class ObjectPair {
 
 private final Object first;
 private final Object second;
 
 public ObjectPair(Object first, Object second) {
  this.first = first;
  this.second = second;
 }
 
 public Object getFirst() {
  return first;
 }
 
 public Object getSecond() {
  return second;
 }
 
}

Instantiating the above class results in an immutable object - all its properties are decorated with final and cannot be changed after construction.

Immutability means that all objects referenced by an immutable container are created before the container construction is complete. For GC: The container is at least as young as the youngest reference it holds. This means that when performing garbage collection in the younger generation, GC skips immutable objects because they are old until it is certain that they are not referenced by any object in the old generation.

Fewer scanned objects means fewer scans of memory pages, and fewer scanned memory pages means shorter GC life cycles, shorter GC pauses, and better overall throughput.

Tip #4: Be careful with string concatenation

Strings are probably the most common non-native data structure in all ES133en-based applications. However, due to its implicit overhead and ease of use, it is easy to become the culprit for a large amount of memory.

The problem is obviously not string literals, but memory allocation initialization at run time. Let's take a quick look at an example of dynamic string construction:


public static String toString(T[] array) {
 
 String result = "[";
 
 for (int i = 0; i & lt; array.length; i++) {
  result += (array[i] == array ? "this" : array[i]);
  if (i & lt; array.length - 1) {
   result += ", ";
  }
 }
 
 result += "]";
 
 return result;
}

This seems like a nice way to do it, to receive an array of 1 characters and then return a string. But this is disastrous for object memory allocation.

It's hard to see what's behind the sugar, but here's what's going on behind the scenes:


public static String toString(T[] array) {
 
 String result = "[";
 
 for (int i = 0; i & lt; array.length; i++) {
 
  StringBuilder sb1 = new StringBuilder(result);
  sb1.append(array[i] == array ? "this" : array[i]);
  result = sb1.toString();
 
  if (i & lt; array.length - 1) {
   StringBuilder sb2 = new StringBuilder(result);
   sb2.append(", ");
   result = sb2.toString();
  }
 }
 
 StringBuilder sb3 = new StringBuilder(result);
 sb3.append("]");
 result = sb3.toString();
 
 return result;
}

Strings are immutable, which means that every time a splice occurs, they themselves are not modified, but new strings are assigned in turn. In addition, the compiler USES the standard StringBuilder class to perform these splices. This is problematic because for each iteration, both a temporary string and a temporary StringBuilder object are implicitly assigned to help build the final result.

The best way to avoid this is to use StringBuilder and direct appends instead of the local splicing operator (" + "). Here is an example:


public static String toString(T[] array) {
 
 StringBuilder sb = new StringBuilder("[");
 
 for (int i = 0; i & lt; array.length; i++) {
  sb.append(array[i] == array ? "this" : array[i]);
  if (i & lt; array.length - 1) {
   sb.append(", ");
  }
 }
 
 sb.append("]");
 return sb.toString();
}

Here, we only allocate 1 StringBuilder of only 1 at the beginning of the method. At this point, all strings and elements in list are appended to a single StringBuilder. Finally using toString() Method converts it to a string once.

Tip #5: Collections that use specific native types

The Java standard collection library is simple and generic, allowing for semi-static binding of types when collections are used. For example, you want to create an Set or store that only holds strings Map<Pair, List> So map, that's a great way to do it.

The real problem comes when we want to use 1 list to store the int type, or 1 map to store the double type as value. Because generics do not support native data types, an alternative is to use wrapper types instead, in this case List.

This approach is wasteful because an Integer is a complete object, the head of an object occupies 12 bytes, and the internal int attribute is maintained, with a total of 16 bytes per Integer object. That's four times as much space as list, which stores the same number of int types! A more serious problem than this is the fact that since Integer is a true object instance, it requires that the garbage collection phase be considered for collection by the garbage collector.

To deal with this, we use the excellent Trove collection library in Takipi. Trove dispenses with partial generics specificity in favor of specific collections of native types that use memory more efficiently. For example, we use very performance-intensive Map<Integer, Double> , there is another special option in Trove, in the form of TIntDoubleMap


TIntDoubleMap map = new TIntDoubleHashMap();
map.put(5, 7.0);
map.put(-1, 9.999);
...

The underlying implementation of Trove USES arrays of native types, so element boxing does not occur when manipulating collections ( int->Integer ) or unpacking ( Integer->int ), because the underlying storage USES native data type storage.

The last

As the garbage collector continues to improve, runtime optimizations and the JIT compiler become smarter. We as developers will find ourselves thinking less and less about how to write GC friendly code. However, at this stage, no matter how G1 improves, there is still a lot we can do to help JVM improve performance.

Ok, the above is the whole content of this article, I hope the content of this article can bring you a definite help to study or use Java, if you have any questions, you can leave a message to communicate.


Related articles: