Some detailed analysis of string concatenation in Java

  • 2020-04-01 03:40:52
  • OfStack

I was busy with the logical implementation of the project during the working day, and I had some time on Saturday to take out thick English Thinking In Java from the bookcase and read the concatenation of string objects. Refer to this book to do a translation, plus their own thinking things, write this article for a record.

Immutable String object

In Java,String objects are Immutable. In code, you can create multiple aliases for a String object. But these aliases all have the same reference.
For example, s1 and s2 are aliases for "droidyue.com" objects, which hold references to real objects. So s1 is equal to s2


String s1 = "droidyue.com";
String s2 = s1;
System.out.println("s1 and s2 has the same reference =" + (s1 == s2));

The only overloaded operator in Java

In Java, the only operator that is overloaded is the concatenation of strings. + + =. In addition, Java designers are not allowed to overload other operators.

Splicing analysis

There really is a performance cost

With the above two points in mind, you might think that since the String object is immutable, the concatenation of multiple (three or more) strings must produce redundant intermediate String objects.


String userName = "Andy";
String age = "24";
String job = "Developer";
String info = userName + age + job;

To get the info above, userName and age will be concatenated to generate a temporary String object t1 with the content of Andy24, and then t1 and job will be concatenated to generate the final info object we need. Among them, an intermediate t1 is generated, and t1 is not actively recycled after creation, which is bound to take up some space. If it's a concatenation of many (say hundreds, mostly seen in calls to objects' tostrings) strings, then the cost is even higher and the performance is much lower.

Compiler optimization

Is there really a performance cost, is string concatenation so common, is there no special processing optimization, the answer is yes, this optimization is done when the compiler compiles.java to bytecode.

A Java program needs to go through two phases, compile time and run time, if it wants to run. At compile time, the Java Compiler (Compiler) converts Java files into bytecode. At runtime, the Java virtual machine (JVM) runs the bytecode generated at compile time. With these two periods, Java was able to do what is called a compilation here and a run here and there.

Let's experiment with compile-time optimizations, and let's create a piece of code that might have a performance cost.


public class Concatenation {
  public static void main(String[] args) {
      String userName = "Andy";
      String age = "24";
      String job = "Developer";
      String info = userName + age + job;
      System.out.println(info);
  }
}

Compile Concatenation. Java. Get the Concatenation. Class


javac Concatenation.java

Then we use javap to decompile the compiled Concatenation. Class file. Javap -c Concatenation. If the javap command is not found, consider adding an environment variable to the directory where the javap is located or using the full path to javap.


17:22:04-androidyue~/workspace_adt/strings/src$ javap -c Concatenation
Compiled from "Concatenation.java"
public class Concatenation {
  public Concatenation();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return          public static void main(java.lang.String[]);
    Code:
       0: ldc           #2                  // String Andy
       2: astore_1
       3: ldc           #3                  // String 24
       5: astore_2
       6: ldc           #4                  // String Developer
       8: astore_3
       9: new           #5                  // class java/lang/StringBuilder
      12: dup
      13: invokespecial #6                  // Method java/lang/StringBuilder."<init>":()V
      16: aload_1
      17: invokevirtual #7                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      20: aload_2
      21: invokevirtual #7                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      24: aload_3
      25: invokevirtual #7                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      28: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      31: astore        4
      33: getstatic     #9                  // Field java/lang/System.out:Ljava/io/PrintStream;
      36: aload         4
      38: invokevirtual #10                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      41: return
}

Among them, LDC, astore and so on are Java bytecode instructions, similar to assembly instructions. The following comments are illustrated using java-related content. We can see that there are a lot of stringbuilders up there, but we don't call them explicitly in the Java code, and this is the optimization that the Java compiler does, when the Java compiler encounters string splicing, it creates a StringBuilder object, and the later splicing actually calls the StringBuilder object's append method. So we don't have the problems that we had up there.

Just compiler optimization?

Since the compiler optimizes for us, is it enough to just rely on compiler optimizations? Of course not.
Now let's look at a piece of code that is not optimized for low performance


public void  implicitUseStringBuilder(String[] values) {
  String result = "";
  for (int i = 0 ; i < values.length; i ++) {
      result += values[i];
  }
  System.out.println(result);
}

Compile with javac and view with javap


public void implicitUseStringBuilder(java.lang.String[]);
    Code:
       0: ldc           #11                 // String
       2: astore_2
       3: iconst_0
       4: istore_3
       5: iload_3
       6: aload_1
       7: arraylength
       8: if_icmpge     38
      11: new           #5                  // class java/lang/StringBuilder
      14: dup
      15: invokespecial #6                  // Method java/lang/StringBuilder."<init>":()V
      18: aload_2
      19: invokevirtual #7                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      22: aload_1
      23: iload_3
      24: aaload
      25: invokevirtual #7                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      28: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      31: astore_2
      32: iinc          3, 1
      35: goto          5
      38: getstatic     #9                  // Field java/lang/System.out:Ljava/io/PrintStream;
      41: aload_2
      42: invokevirtual #10                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      45: return

Where 8: if_icmpge 38 and 35: goto 5 constitute a loop. If_icmpge 38 means that if the integer contrast of the JVM operand stack is greater than or equal to (I < The opposite of values. Length) is true, skip to line 38 (system.out). 35: goto 5 means jump to line 5.

But one of the important things about this is that StringBuilder object creation occurs between loops, which means how many times the loop creates how many StringBuilder objects, which is obviously not good. Nakedly low-level code.

I'm going to optimize it a little bit, and I'm going to instantly improve pretend bility.


public void explicitUseStringBuider(String[] values) {
  StringBuilder result = new StringBuilder();
  for (int i = 0; i < values.length; i ++) {
      result.append(values[i]);
  }
}

The corresponding compiled information


public void explicitUseStringBuider(java.lang.String[]);
    Code:
       0: new           #5                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #6                  // Method java/lang/StringBuilder."<init>":()V
       7: astore_2
       8: iconst_0
       9: istore_3
      10: iload_3
      11: aload_1
      12: arraylength
      13: if_icmpge     30
      16: aload_2
      17: aload_1
      18: iload_3
      19: aaload
      20: invokevirtual #7                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      23: pop
      24: iinc          3, 1
      27: goto          10
      30: return

As you can see above, 13: if_icmpge 30 and 27: goto 10 form a loop loop, and 0: new #5 is outside the loop, so StringBuilder is not created multiple times.

In general, we want to avoid creating stringbuilders implicitly or explicitly in the body of the loop. So people who know how the code compiles and executes internally tend to write better code.

The above article, if there is a mistake, please criticize and correct.


Related articles: