In depth analysis of Java serialization mechanism and principle

  • 2020-04-01 01:46:25
  • OfStack

Java serialization algorithm dialysis

      Serialization is the process of describing an object in a series of bytes. Deserialization is the process of reconstructing these bytes into an object. The Java serialization API provides a standard mechanism for handling object serialization. Here you can learn how to serialize an object, when to serialize it, and the Java serialization algorithm. We use an example to demonstrate how the serialized bytes describe the information of an object.
Necessity of serialization

      In Java, everything is an Object, and in a distributed environment it is often necessary to pass objects from one end of the network or device to the other. This requires a protocol that transfers data from one end to the other. The Java serialization mechanism was created to solve this problem.
How do I serialize an object

An object can be serialized only if it implements the Serializable interface, which has no methods and is more like a tag. Classes with this tag can be handled by the serialization mechanism.


import java.io.Serializable;       
class TestSerial implements Serializable {       
           public byte version = 100;     
           public byte count = 0;       
} 

      Then we write a program that serializes the object and outputs it. Object outputstream outputs Object as Byte stream. We temporarily store the Byte stream in the temp.out file.

public static void main(String args[]) throws IOException {       
    FileOutputStream fos = new FileOutputStream("temp.out");       
    ObjectOutputStream oos = new ObjectOutputStream(fos);       
    TestSerial ts = new TestSerial();       
    oos.writeObject(ts);       
    oos.flush();       
    oos.close();       
}

      If we want to read the Bytes reconstruction object from a persistent file, we can use an ObjectInputStream.

public static void main(String args[]) throws      IOException {       
      FileInputStream fis = new FileInputStream("temp.out");       
      ObjectInputStream oin = new ObjectInputStream(fis);       
      TestSerial ts = (TestSerial) oin.readObject();       
       System.out.println("version="+ts.version);       
 } 

The execution result is

100.
The serialization format of the object

What does serializing an object look like? Open the temp.out file that we just serialized out of the object, and display it in hexadecimal. It should read:


AC ED 00 05 73 72 00 0A 53 65 72 69 61 6C 54 65
73 74 A0 0C 34 00 FE B1 DD F9 02 00 02 42 00 05
63 6F 75 6E 74 42 00 07 76 65 72 73 69 6F 6E 78
70 00 64

This blob is used to describe the TestSerial object after serialization, and we notice that there are only two fields in the TestSerial class:

      Public byte version = 100;

      Public byte count = 0;

In theory, only 2 bytes are needed to store these two fields, but in fact temp.out occupies 51bytes, which means that in addition to data, other descriptions of serialized objects are also included.
Java serialization algorithm

Serialization algorithms generally follow these steps:

In pieces output the class metadata related to the object instance.

In pieces recursively outputs the superclass description of the class until there are no more superclasses.

In pieces when the class metadata is complete, begin to output the actual data value of the object instance from the topmost superclass.

In pieces output instance data recursively from top to bottom

Let's use another example that covers all possible scenarios more completely:


    class parent implements Serializable {       
           int parentVersion = 10;       
    }  

    class contain implements Serializable{       
           int containVersion = 11;       
    }  

    public class SerialTest extends parent implements Serializable {       
           int version = 66;       
           contain con = new contain();                   
           public int getVersion() {       
                  return version;       
           }       
           public static void main(String args[]) throws IOException {       
                  FileOutputStream fos = new FileOutputStream("temp.out");       
                  ObjectOutputStream oos = new ObjectOutputStream(fos);       
                  SerialTest st = new SerialTest();       
                  oos.writeObject(st);       
                  oos.flush();       
                  oos.close();       
           }       
    } 

This example is pretty straightforward. The SerialTest class implements the Parent superclass and holds a Container object internally.

The format after serialization is as follows:

AC ED 00 05 73  72 00 0A 53 65 72 69 61 6C 54 65

73 74 05 52 81 5A AC 66 02 F6 02 00 02  07 00 49

76 65 72 73 69 6F 6E  4 c   00 03 63 6F 6E  74 00 09

4 c   63 6F 6E 74 61 69 6E 3B 78  72 00 06 70 61 72

65 6E 74 0E DB D2 BD 85 EE 63 7A 02 00 01  00 49

0D 70 61 72 65 6E 74 56 65 72 73 69 6F 6E 78 70

00   00   00   00 0 a   00   00   42   73, 72, 00, 07, 63, 6F, 6E, 74

61 69 6E FC BB E6 0E FB CB 60 C7 02 00 01  00 49

0E 63 6F 6E 74 61 69 6E 56 65 72 73 69 6F 6E 78

70   00   00   00   0 b

Let's take a closer look at what these bytes represent. At the beginning, see color:

AC ED: STREAM_MAGIC.   The declaration USES the serialization protocol. 00 05: STREAM_VERSION.   Serialization protocol version. 0 x73: TC_OBJECT.   Declare that this is a new object.  

The first step in the serialization algorithm is to output a description of the object's associated classes. The object shown in the example is an instance of the SerialTest class, so the description of the SerialTest class is printed next. See color:

0 x72: TC_CLASSDESC.   Declare that a new Class starts here. 00 0A: length of the Class name. 53 65 72 69 61 6c 54 65 73 74:  SerialTest, the Class name of the Class. 05 52 81 5A AC 66 02 F6:  SerialVersionUID,   The serialized ID, if not specified, is randomly generated by the algorithm with an 8byte ID. 0 x02:   Tag number.   This value declares that the object supports serialization. 00 02:   The number of fields that this class contains.

Next, the algorithm outputs one of these fields, int  Version = 66; See color:

0 x49:   Domain type. 49  On behalf of the "I",   Is the Int. 7:00   The length of the domain word. 6F 6E: version, domain name word description.

Then, the algorithm outputs the next field, contain con = new contain(); This is a little bit special, it's an object. When describing an object type reference, you need to use the JVM's standard object signature notation, see color:

0 x4c:   The type of the domain. 00 03:   Domain word length. 6 f e: 63   Domain word description, con 0 x74: TC_STRING.   Represents a new String. Use String to refer to an object. 00 09:   The String length. 4C 63 6F 6E 74 61 69 6E 3B:  Lcontain; , the standard object signature notation for the JVM. 0x78: TC_ENDBLOCKDATA, the flag that marks the end of an object's data block

Next, the algorithm will output the super class, which is the Parent class description, see the color:

0 x72: TC_CLASSDESC.   Declare that this is a new class. 6:00   Class name length. 70 61 72 65 6E 74: parent, class name description. 0E DB D2 BD 85 EE 63 7A:  SerialVersionUID,   Serialization ID. 0 x02:   Tag number.   This value declares that the object supports serialization. 00, 01   The number of fields in a class.

Next, print the domain description of the parent class, int  ParentVersion = 100; See color:

0 x49:   Domain type. 49  On behalf of the "I",   Is the Int. 00 0 d:   Domain word length. 67 61 72 65 6E 74 56 65 72 73 69 6F 6E:  ParentVersion, domain name word description. 0x78: TC_ENDBLOCKDATA, the flag that marks the end of an object block. 0 x70: TC_NULL,   Indicates that there are no other superclass flags. .

So far, the algorithm has output for all the class descriptions. The next step is to print out the actual value of the instance object. This is where it starts from the parent Class field, see color:

00   00   00   0 a: 10,   Values of the parentVersion field.

There is also the domain for the SerialTest class:

00   00   00   42:66, value of version field.

The following bytes are interesting. The algorithm needs to describe the contain class. Remember, there is no description of the contain class.

0 x73: TC_OBJECT,   Declare that this is a new object. 0x72: TC_CLASSDESC declares that a new Class starts here. 7:00   The length of the class name. 63 6F 6E 74 61 69 6E: contain. FC BB E6 0E FB CB 60 C7:  SerialVersionUID,   Serialization ID. 0 x02: Various flags.   Tag number.   This value declares that the object supports serialization 00, 01   The number of fields in a class.

The output contains a unique domain description, int  ContainVersion = 11;

0 x49:   Domain type. 49  On behalf of the "I",   Is the Int... 00 0 e:   Domain word length. 63 6F 6E 74 61 69 6E 56 65 72 73 69 6F 6E:  ContainVersion,   Domain name word description. 0x78: the flag that marks the end of the TC_ENDBLOCKDATA object block.

At this point, the serialization algorithm checks whether or not there is a superclass contained, and if so, the output is followed.

0x70:TC_NULL, no more superclasses.

Finally, the actual field value of the contain class is output.

00   00   00   0 b: 11,   ContainVersion values.

OK, we discussed the mechanism and principle of Java serialization, hoping to help the students.


Related articles: