Some detailed analysis of string concatenation in Java
- 2020-04-01 03:40:52
- OfStack
I was busy with the logical implementation of the project during the working day, and I had some time on Saturday to take out thick English Thinking In Java from the bookcase and read the concatenation of string objects. Refer to this book to do a translation, plus their own thinking things, write this article for a record.
Immutable String object
In Java,String objects are Immutable. In code, you can create multiple aliases for a String object. But these aliases all have the same reference.
For example, s1 and s2 are aliases for "droidyue.com" objects, which hold references to real objects. So s1 is equal to s2
String s1 = "droidyue.com";
String s2 = s1;
System.out.println("s1 and s2 has the same reference =" + (s1 == s2));
The only overloaded operator in Java
In Java, the only operator that is overloaded is the concatenation of strings. + + =. In addition, Java designers are not allowed to overload other operators.
Splicing analysis
There really is a performance cost
With the above two points in mind, you might think that since the String object is immutable, the concatenation of multiple (three or more) strings must produce redundant intermediate String objects.
String userName = "Andy";
String age = "24";
String job = "Developer";
String info = userName + age + job;
To get the info above, userName and age will be concatenated to generate a temporary String object t1 with the content of Andy24, and then t1 and job will be concatenated to generate the final info object we need. Among them, an intermediate t1 is generated, and t1 is not actively recycled after creation, which is bound to take up some space. If it's a concatenation of many (say hundreds, mostly seen in calls to objects' tostrings) strings, then the cost is even higher and the performance is much lower.
Compiler optimization
Is there really a performance cost, is string concatenation so common, is there no special processing optimization, the answer is yes, this optimization is done when the compiler compiles.java to bytecode.
A Java program needs to go through two phases, compile time and run time, if it wants to run. At compile time, the Java Compiler (Compiler) converts Java files into bytecode. At runtime, the Java virtual machine (JVM) runs the bytecode generated at compile time. With these two periods, Java was able to do what is called a compilation here and a run here and there.
Let's experiment with compile-time optimizations, and let's create a piece of code that might have a performance cost.
public class Concatenation {
public static void main(String[] args) {
String userName = "Andy";
String age = "24";
String job = "Developer";
String info = userName + age + job;
System.out.println(info);
}
}
Compile Concatenation. Java. Get the Concatenation. Class
javac Concatenation.java
Then we use javap to decompile the compiled Concatenation. Class file. Javap -c Concatenation. If the javap command is not found, consider adding an environment variable to the directory where the javap is located or using the full path to javap.
17:22:04-androidyue~/workspace_adt/strings/src$ javap -c Concatenation
Compiled from "Concatenation.java"
public class Concatenation {
public Concatenation();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: ldc #2 // String Andy
2: astore_1
3: ldc #3 // String 24
5: astore_2
6: ldc #4 // String Developer
8: astore_3
9: new #5 // class java/lang/StringBuilder
12: dup
13: invokespecial #6 // Method java/lang/StringBuilder."<init>":()V
16: aload_1
17: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
20: aload_2
21: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
24: aload_3
25: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
28: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
31: astore 4
33: getstatic #9 // Field java/lang/System.out:Ljava/io/PrintStream;
36: aload 4
38: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
41: return
}
Among them, LDC, astore and so on are Java bytecode instructions, similar to assembly instructions. The following comments are illustrated using java-related content. We can see that there are a lot of stringbuilders up there, but we don't call them explicitly in the Java code, and this is the optimization that the Java compiler does, when the Java compiler encounters string splicing, it creates a StringBuilder object, and the later splicing actually calls the StringBuilder object's append method. So we don't have the problems that we had up there.
Just compiler optimization?
Since the compiler optimizes for us, is it enough to just rely on compiler optimizations? Of course not.
Now let's look at a piece of code that is not optimized for low performance
public void implicitUseStringBuilder(String[] values) {
String result = "";
for (int i = 0 ; i < values.length; i ++) {
result += values[i];
}
System.out.println(result);
}
Compile with javac and view with javap
public void implicitUseStringBuilder(java.lang.String[]);
Code:
0: ldc #11 // String
2: astore_2
3: iconst_0
4: istore_3
5: iload_3
6: aload_1
7: arraylength
8: if_icmpge 38
11: new #5 // class java/lang/StringBuilder
14: dup
15: invokespecial #6 // Method java/lang/StringBuilder."<init>":()V
18: aload_2
19: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
22: aload_1
23: iload_3
24: aaload
25: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
28: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
31: astore_2
32: iinc 3, 1
35: goto 5
38: getstatic #9 // Field java/lang/System.out:Ljava/io/PrintStream;
41: aload_2
42: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
45: return
Where 8: if_icmpge 38 and 35: goto 5 constitute a loop. If_icmpge 38 means that if the integer contrast of the JVM operand stack is greater than or equal to (I < The opposite of values. Length) is true, skip to line 38 (system.out). 35: goto 5 means jump to line 5.
But one of the important things about this is that StringBuilder object creation occurs between loops, which means how many times the loop creates how many StringBuilder objects, which is obviously not good. Nakedly low-level code.
I'm going to optimize it a little bit, and I'm going to instantly improve pretend bility.
public void explicitUseStringBuider(String[] values) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < values.length; i ++) {
result.append(values[i]);
}
}
The corresponding compiled information
public void explicitUseStringBuider(java.lang.String[]);
Code:
0: new #5 // class java/lang/StringBuilder
3: dup
4: invokespecial #6 // Method java/lang/StringBuilder."<init>":()V
7: astore_2
8: iconst_0
9: istore_3
10: iload_3
11: aload_1
12: arraylength
13: if_icmpge 30
16: aload_2
17: aload_1
18: iload_3
19: aaload
20: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
23: pop
24: iinc 3, 1
27: goto 10
30: return
As you can see above, 13: if_icmpge 30 and 27: goto 10 form a loop loop, and 0: new #5 is outside the loop, so StringBuilder is not created multiple times.
In general, we want to avoid creating stringbuilders implicitly or explicitly in the body of the loop. So people who know how the code compiles and executes internally tend to write better code.
The above article, if there is a mistake, please criticize and correct.