In depth understanding of string types in Java

  • 2020-04-01 02:58:22
  • OfStack

1.Java built-in support for strings;
The so-called built-in support means that the string type is not implemented through a char pointer like the C language, and the string encoding of Java is Unicode compliant, which also means that the C language and Unicode standard are not implemented through the use of the string and wstring classes as in C++. Java internally implements support for String types through the String class.
This means that we can directly call the same methods as String objects on String constants:


// you can call all methods of the String object directly on "ABC"
Int length = "ABC". The length ();
As well as
String ABC = new String (" ABC ");
Int length = the length ();

2. The string value in Java is constant

The meaning here is that after the String type is created, it cannot change the value in it. From the member method of String, it can be seen that there is no method interface that can change the value; And like "ABC" in "ABC" in "new String("def"),"def" in the constant pool in the Java virtual machine.

The "ABC" in the following code is stored in the constant pool, so that the address that variables a and ab point to is the same "ABC" in the constant pool.


public class StringTest {
    public static void main(String[] args) {
        String a="abc";
        String ab="abc";
        String abc=new String("abc");
        System.out.println(ab==a);
        System.out.println(a==abc);
    }
}

So how do dynamically generated, mutable strings work? The StringBuffer and StringBuilder classes are provided in Java to fulfill this requirement. String concatenation in Java can use the "+" operator; Such as: "ABC" + "def"; The internal implementation here can also be implemented using the StringBuilder class or the StringBuffer class; So how is StringBuilder and StringBuffer implemented internally? Is to store strings through an array of characters. The following snippet, found in the source code shipped with the JDK, shows that StringBuffer internally stores strings using a char array, where AbstractStringBuilder is the parent of StringBuffer:

< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201402/2014215154727860.jpg" >

3. Encoding in strings.
There are two issues to understand here: how do you handle the string encoding in the source file? What is the string encoded for when the Java virtual machine is running?
The first is that the string encoding in the source code depends on your IDE or text editor. As the following code is edited using GBK encoding format, and then opened using utf-8 and GBK decoding
//GBK code format, open with GBK format

< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201402/2014215154846172.jpg" >

//GBK encoding format, open in utf-8 format, garble; If the default encoding format of the system is not GBK, the "-encoding GBK" parameter option value should be added in javac at compile time.

< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201402/2014215154939193.jpg" >

So how do you deal with this source code problem? The answer is specified in the compiler javac parameter option -encoding, which defaults to the same value as the system default encoding. Windows default encoding is generally GBK (which can be obtained by system.getproperty ("file.encoding")); The system default encoding is GBK, but the source code USES utf-8 encoding, which should be compiled using javac-encoding utf-8.

What is the code encoding for the "compile to class file" or for the Java virtual machine (JVM) when it is running? "First of all, the String type in Java is implemented in utf-16, meaning that strings in the Java virtual machine are implemented in utf-16 regardless of the encoding in the source code. This means that as long as the compiler javac correctly understands the encoding of strings in the source file, strings in the runtime or class bytecode file are encoded independently of the source code. Here we can further understand the basic type of char or the Character class in Java. The internal encoding of these two types is the same as the string type in Java, which is implemented based on utf-16 encoding, that is, no matter the length of 'a','1' or Chinese characters in Java is 16 bits.

And in the String type, there is also the conversion between the underlying binary representation and the String by specifying the fixed-character encoding, which means that we can correctly read GBK encoding, utf-8 encoding or other encoding text file or other input stream and convert it to the correct String in memory.

For example, there are the following methods in the String class:
Public String(byte[] bytes, Charset Charset); A string is constructed by specifying a specific character set encoding type and the corresponding byte array (byte length is 8 bits).
Public byte[] getBytes(Charset Charset); Specifies a character set encoding type that converts a string to a byte array, or binary representation of a string.

There is another member method of String to note:

Public byte [] getBytes (); This method returns a byte array based on the platform's default character set encoding, not necessarily utf-16.


Related articles: