Java virtual machine JVM performance optimization (part 1) : a summary of JVM knowledge

2020-04-01 03:28:21
OfStack
Java applications run on the JVM, but do you know anything about JVM technology? This article (the first in this series) describes how the classic Java virtual machine works, such as the pros and cons of writing Java once, cross-platform engines, garbage collection basics, classic GC algorithms, and compilation optimizations. Later articles will cover JVM performance optimizations, including the latest JVM design to support performance and extensions for today's highly concurrent Java applications.

If you're a developer, you've definitely had that special feeling where you suddenly have an Epiphany, where all of your thoughts are connected, and you can come back and forth with a new perspective. I personally love the feeling of learning something new. I've had this experience many times, when working with JVM technology, especially garbage collection and JVM performance optimization. I hope to share these insights with you in this new Java world. I hope you're as excited to learn about JVM performance as I am to write this article.

This series of articles is for all Java developers interested in learning more about the underlying JVM and what the JVM actually does. At a higher level, I'll discuss garbage collection and the endless quest for free memory security and speed without affecting application performance. You will learn key parts of the JVM: garbage collection and GC algorithms, compilation optimizations, and some common optimizations. I'll also discuss why Java markup is so difficult and provide advice on when to consider test performance. Finally, I'll cover some of the new innovations in JVM and GC, including the emphasis on Azul's Zing JVM, IBM JVM, and Oracle's Garbage First (G1) Garbage collection.

I hope you'll come away from this series with a deeper understanding of the nature of Java's extensibility limitations, and how those same limitations force us to create a Java deployment in an optimal way. Hopefully you'll have a feeling of clarity and some good Java inspiration: stop accepting those limitations and change them! If you're not an open source worker right now, this series might encourage you to do just that.

JVM performance and "compile once, run everywhere" challenges

I have new news for those who stubbornly believe that the Java platform is inherently slow. The JVM's notorious Java performance problems were more than a decade ago when Java was first introduced as an enterprise, but that conclusion is now outdated. It is true that if you now run simple static and deterministic tasks on different development platforms, you will most likely find that using machine-optimized code performs better than using any virtual environment, under the same JVM. However, Java performance has improved significantly over the past decade. The market demand and growth of the Java industry has led to a small number of garbage collection algorithms, new compilation innovations, and a large number of heuristics and optimizations that have advanced JVM technology. I'll cover some of that in a later chapter.

The technical beauty of the JVM is also its biggest challenge: nothing can be considered a "compile once, run anywhere" application. Instead of optimizing a use case, an application, and a particular user load, the JVM keeps track of what the Java application is doing and optimizes accordingly. This dynamic operation leads to a series of dynamic problems. When design innovates (at least not when we ask for performance from production), developers working on the JVM don't rely on static compilation and predictable allocation rates.

The cause of JVM performance

In my early work I realized that garbage collection was very difficult to "solve" and I was always fascinated by JVMs and middleware technology. My passion for JVMs began when I was on the JRockit team, coding a new approach to self-learning and debugging garbage collection algorithms on my own (see Resources). This project (transformed into an experimental feature of JRockit and became the basis for the Deterministic Garbage Collection algorithm) started my journey with JVM technology. I have worked for BEA systems, Intel, Sun, and Oracle (which briefly worked for Oracle because Oracle acquired BEA systems). Then I joined the team at Azul Systems to manage the Zing JVM, and now I work for Cloudera.

Machine-optimized code may achieve better performance (but at the expense of flexibility), but this is not a reason to weigh it against dynamically loaded and rapidly changing enterprise applications. Most businesses are willing to sacrifice the grudgingly perfect performance of machine-optimized code for the good of Java.

1. Easy to code and function development (meaning less time to respond to the market)

2. Get knowledgeable programmers

3. Faster development with Java APIs and standard libraries

4. Portability -- no need to rewrite Java applications for a new platform

From Java code to bytecode

As a Java programmer, you are probably familiar with coding, compiling, and executing Java applications. Example: let's say you have a program (myapp.java) and now you want it to run. To execute this program you need to compile it using javac (the static Java language to bytecode compiler built into the JDK). Based on Java code, javac generates the corresponding executable bytecode and stores it in a class file with the same name: myapp.class. After compiling the Java code into bytecode, you can use the Java command (either from the command line or a startup script, with or without the startup option) to launch the executable class file to run your application. So your class is loaded into the runtime (which means the running of the Java virtual machine) and the program begins to execute.

This is literally every application execution scenario, but now let's explore what happens when you execute a Java command. What is a Java virtual machine? Most developers through continuous debugging to interact with the JVM - aka selecting and value - assigning startup options can make your Java program running faster, at the same time avoid the infamous "out of memory" errors. But have you ever wondered why we needed a JVM to run Java applications in the first place?

What is a Java virtual machine?

Simply put, a JVM is a software module that executes Java application bytecode and converts bytecode to hardware, operating system specific instructions. By doing so, the JVM allows Java programs to execute in different environments after they are first written, without changing the original code. Java portability is the key to an enterprise application language: developers don't have to rewrite application code for different platforms because the JVM is responsible for translation and platform optimization.

A JVM is basically a virtual execution environment, which ACTS as a bytecode instruction machine, and is used to allocate execution tasks and perform memory operations through interactions with the underlying layer.

A JVM also looks after dynamic resource management for running Java applications. This means it knows how to allocate and free memory, maintains a consistent threading model on each platform, and organizes executable instructions where the application executes in a way that suits the CPU architecture. The JVM frees developers from references in trace objects and how long they need to be in the system. Again, it doesn't require us to manage when to free up memory - a pain point for a non-dynamic language like C.

You can think of the JVM as an operating system that runs exclusively for Java. Its job is to manage the runtime environment for Java applications. A JVM is essentially a virtual execution environment that interacts with the underlying execution environment as a bytecode instruction machine that allocates execution tasks and performs memory operations.

Overview of JVM components

There have been many articles on JVM internals and performance optimizations. As the basis for this series, I'll summarize the overview of JVM components. This short review will be especially helpful for developers who are new to the JVM and will make you want to know more about the discussion later.

From one language to another -- about the Java compiler

A compiler inputs one language and outputs another executable statement. The Java compiler has two main tasks:

1. Make the Java language more portable and not fixed to a particular platform the first time you write it;

2. Ensure that effective executable code is generated for a particular platform.

Compilers can be static or dynamic. An example of static compilation is javac. It takes Java code as input and turns it into bytecode (a language executed in the Java virtual machine). The static compiler interprets the input code once and outputs the executable form, which is used when the program is executed. Because the input is static, you will always see the same result. Only when you modify the original code and recompile it can you see the different output.

Dynamic compiler

For example, just-in-time (JIT) compilers, which dynamically convert one language to another, mean that they execute the code as they do so. The JIT compiler lets you collect or create run data analysis (by inserting performance counts), with the compiler deciding to use the ambient data at hand. Dynamic compilers can implement better sequences of instructions, replace sequences of instructions with more efficient ones, and even eliminate redundant operations as they are compiled into the language. Over time you will collect more code configuration data and make more and better compilation decisions; The whole process is what we commonly call code optimization and recompilation.

Dynamic compilation gives you the advantage of being able to adjust dynamic changes based on behavior, or new optimizations that result as application loads increase. This is why dynamic compilers are great for running Java. Note that the dynamic compiler requests external data structures, thread resources, CPU cycle analysis, and optimization. The deeper you optimize, the more resources you'll need. However, in most environments, the top level provides very little performance improvement -- 5 to 10 times faster than your pure interpretation.

Distribution leads to garbage collection

Allocation in each thread based on each "Java process allocation memory address space," or Java heap, or just heap. Single-threaded allocation is common in client applications in the Java world. However, single-threaded allocation does not have any benefit on the enterprise application and work-load side, because it does not take advantage of the parallelism of today's multi-core environments.

Parallel application design also forces the JVM to ensure that multiple threads do not allocate the same address space at the same time. You can control this by placing a lock throughout the allocated space. But this technique (commonly known as heap locking) is very performance intensive, and holding or queuing threads can affect the performance of resource utilization and application optimization. The good thing about multi-core systems is that they create a requirement for all kinds of new ways to prevent single-threaded bottlenecks and serialization while allocating resources.
A common approach is to divide the heap into sections where each composite partition size is appropriate for the application -- they obviously need to be tuned, and the allocation rate and object size vary significantly from application to application, as does the number of threads. Thread Local Allocation Buffer (TLAB), or sometimes Thread Local Area (TLA), is a dedicated partition where threads can be allocated freely without declaring a full heap lock. When the area is full, the heap is full, indicating that there is not enough free space on the heap for objects and that space needs to be allocated. When it's full, the recycling starts.

debris

Using TLABs to catch exceptions is to fragment the heap to reduce memory efficiency. If an application fails to add or fully allocate a TLAB space when it wants to allocate objects, there is a risk that the space will be too small to generate new objects. Such free space is treated as "debris". If the application keeps references to objects and then allocates the rest of the space, it will end up being free for a long time.

Fragmentation is when fragmentation is scattered in the heap - wasting heap space by using a small amount of unused memory space. Allocating the "wrong" TLAB space for your application (about the size of the objects, the size of the mixed objects, and the reference holding rate) is what causes the increase in fragmentation in the heap. As the application runs, the number of fragments increases the amount of space in the heap. Fragmentation causes performance degradation and the system cannot allocate enough threads and objects to new applications. The garbage collector then has a hard time blocking out-of-memory exceptions.

TLAB waste is generated at work. One way to completely or temporarily avoid fragmentation is to optimize the TLAB space at each base operation. This approach typically requires re-tuning as long as the application has allocation behavior. This can be done with complex JVM algorithms, or by organizing heap partitions for more efficient memory allocation. For example, the JVM can implement free-lists, which are a series of free memory blocks of a specific size that are connected. A contiguous block of free memory is connected to another contiguous block of the same size, creating a small number of linked lists, each with its own boundary. In some cases free-lists lead to better memory allocation. Threads can be allocated objects in a block of about the same size, potentially generating less fragmentation than if you just relied on a fixed-size TLAB.

The GC trivia

Some early garbage collectors had multiple ages, but more than two ages caused overhead to exceed value. Another way to optimize allocation and reduce fragmentation is to create a so-called Cenozoic, a dedicated heap space dedicated to allocating new objects. The rest of the heap will become what's called an old age. The old ages are used to allocate long-lived objects. Objects that are assumed to exist for a long time include objects that are not garbage collected or large objects. To better understand this method of allocation, we need to talk about garbage collection.

Garbage collection and application performance

Garbage collection is the JVM's garbage collector to free up unreferenced occupied heap memory. When garbage collection is first triggered, all object references are still held, and space occupied by previous references is freed or reassigned. After all the recyclable memory has been collected, the space waits to be fetched and re-allocated to the new object.

The garbage collector can never redeclare a reference object, which would break the JVM's standard specifications. The exception to this rule is a soft or weak reference that can be caught if the garbage collector is about to run out of memory. I strongly recommend that you avoid weak references, however, because the ambiguity of the Java specification leads to errors in interpretation and use. What's more, Java is designed for dynamic memory management, because you don't have to worry about when and where to free up memory.

One of the challenges of the garbage collector is to allocate memory without affecting the running application. If you don't garbage collect as much as possible, your application will run out of memory. If you collect too often, you will lose throughput and response time, which will have a bad effect on the running application.

GC algorithm

There are many different garbage collection algorithms. Several points will be discussed in more detail later in the series. At the highest level, the two main methods of garbage collection are reference counting and the tracing collector.

The reference counting collector tracks how many references an object points to. When an object's reference is 0, memory is immediately reclaimed, which is one of the advantages of this approach. The difficulty with reference counting is the ring data structure and keeping all references up to date.

The trace collector marks the objects that are still referenced, and it follows and marks all referenced objects repeatedly with the marked objects. When all objects that are still referenced are marked "live", all unmarked space is reclaimed. This approach manages the ring data structure, but in many cases the collector should wait until all tags are complete before reclaiming unreferenced memory.

There are different ways to be above. The most famous algorithms are marking or copying algorithm, parallel or concurrent algorithm. I'll discuss these later in the article.

In general, garbage collection is all about allocating address space in the heap for new and old objects. Where "old objects" refer to objects that survive many garbage collections. Using the new generation to allocate new objects and the old generation to old objects reduces fragmentation by quickly recycling short-lived objects that occupy memory, as well as by aggregating long-lived objects and placing them in the old age address space. All of this reduces fragmentation between long-lived objects and storing heap memory without fragmentation. One of the positive effects of the new generation is that it delays the time it takes to reclaim older objects at a greater cost. You can reuse the same space for ephemeral objects. (old-space collections are more expensive because long-lived objects contain more references and require more traversal.)

Mention the final value of the algorithm is a compaction, this is the way to manage memory fragments. A Compaction object basically is to move together, never release more continuous memory space. If you are familiar with the tools, disk fragments and handle it you will find a compaction like it, the difference is the running in the Java heap memory. I will be a compaction are discussed in detail in the series.

Summary: review and highlights

The JVM allows portability (programming once, running everywhere) and dynamic memory management, and all of the major features of the Java platform are reasons for its popularity and increased productivity.

In the first article on JVM performance optimization systems, I explained how a compiler translates bytecode into the target platform's instruction language and helps dynamically optimize the execution of Java programs. Different applications require different compilers.

I also outlined memory allocation and garbage collection, and how these relate to Java application performance. Basically, the faster you fill the heap and the more frequently you trigger garbage collection, the higher the share of Java applications. One of the challenges of the garbage collector is that when allocating memory, you need to minimize the impact on the running application, but before the application runs out of memory. We will discuss both traditional and new garbage collection and JVM performance optimization in more detail in a future article.