Explain the principle of PHP serialization and deserialization in detail

  • 2021-09-04 23:37:38
  • OfStack

0. Preface

Object serialization and deserialization will not be repeated. The result of serialization in php is a custom string format in php, which is somewhat similar to json.

We need to solve several problems when designing serialization and deserialization of objects in any language

After an object is serialized, the result of the serialization is self-descriptive (knowing the specific type of the object from the serialized result,

It is not enough to know the type, of course, you need to know the specific value corresponding to this type.

Privilege control when serialization, you can customize the serialization field, such as golang to do it is very convenient.

Time performance issues: In some performance-sensitive scenarios, object serialization is not a drag, such as high-performance services (I often use protobuf for serialization).

Spatial performance problem: The result after serialization cannot be too long, for example, an int object in memory, and the data length after serialization becomes 10 times the length of int, so this serialization algorithm is problematic.

This article only explains the serialization and deserialization process in php from the perspective of php code. Remember that serialization and deserialization operations are only the data of objects, which should be easy to understand if you have experience in object-oriented development.

1. Serialize serialize and deserialize method unserialize

php provides object serialization natively, unlike c + + … ^ _ ^. It is also very simple to use, with only two interfaces.


class fobnn
{
 public $hack_id;
 private $hack_name;
 public function __construct($name,$id)
 {
  $this->hack_name = $name;
  $this->hack_id = $id;
 }
 public function print()
 {
  echo $this->hack_name.PHP_EOL;
 }
}
$obj = new fobnn('fobnn',1);
$obj->print();
$serializedstr = serialize($obj); // Pass serialize Interface serialization 
echo $serializedstr.PHP_EOL;;
$toobj = unserialize($serializedstr);// Pass unserialize Deserialization 
$toobj->print();

fobnn
O:5:"fobnn":2:{s:7:"hack_id";i:1;s:16:"fobnnhack_name";s:5:"fobnn";}
fobnn

Seeing the output in line 2, this string is the result of serialization. This structure is actually very readable. It can be found that it is mapped by object name/member name. Of course, the label names of members with different access rights are slightly different after serialization.

According to the three questions I mentioned above, then we can look at

1. Self-descriptive functionality

O: 5: "fobnn": 2 Where o denotes the object type and the type name is fobnn, in this format, the following 2 denotes an object with two members.

As for member objects, they are actually described in the same set, which is a recursive definition.

The function of self-description is mainly realized by recording the names of objects and members in strings.

2. Performance issues

The time performance of php serialization will not be analyzed in this paper, see later for details, but the serialization result is actually similar to the protocol defined by json/bson, with a protocol header, which describes the type, while the protocol body describes the corresponding value of the type, and will not compress the serialization result.

2. Magic methods in deserialization

Corresponding to the second problem mentioned above, in fact, php also has a solution, one is through magic methods, and the second is custom serialization function. First, introduce the magic methods __sleep and __wakeup


class fobnn
{
 public $hack_id;
 private $hack_name;
 public function __construct($name,$id)
 {
  $this->hack_name = $name;
  $this->hack_id = $id;
 }
 public function print()
 {
  echo $this->hack_name.PHP_EOL;
 }
 public function __sleep()
 {
  return array("hack_name");
 }
 public function __wakeup()
 {
  $this->hack_name = 'haha';
 }
}
$obj = new fobnn('fobnn',1);
$obj->print();
$serializedstr = serialize($obj);
echo $serializedstr.PHP_EOL;;
$toobj = unserialize($serializedstr);
$toobj->print();

fobnn
O:5:"fobnn":1:{s:16:"fobnnhack_name";s:5:"fobnn";}
haha

Before serialization, __sleep will be called to return an array of member names to be serialized, so that we can control the data to be serialized. In this case, I only returned hack_name, and we can see that only hack_name members are serialized in the result.

After the serialization is complete, we skip to __wakeup where we can do some follow-up work, such as reconnecting the database and so on.

3. Customize the Serializable interface


interface Serializable {
abstract public string serialize ( void )
abstract public void unserialize ( string $serialized )
}

Through this interface we can customize the serialization and deserialization behavior, this function can be used to customize our serialization format.


class fobnn implements Serializable
{
 public $hack_id;
 private $hack_name;
 public function __construct($name,$id)
 {
  $this->hack_name = $name;
  $this->hack_id = $id;
 }
 public function print()
 {
  echo $this->hack_name.PHP_EOL;
 }

 public function __sleep()
 {
  return array('hack_name');
 }

 public function __wakeup()
 {
  $this->hack_name = 'haha';
 }

 public function serialize()
 {
  return json_encode(array('id' => $this->hack_id ,'name'=>$this->hack_name ));
 }

 public function unserialize($var)
 {
  $array = json_decode($var,true);
  $this->hack_name = $array['name'];
  $this->hack_id = $array['id'];
 }
}
$obj = new fobnn('fobnn',1);
$obj->print();
$serializedstr = serialize($obj);
echo $serializedstr.PHP_EOL;;
$toobj = unserialize($serializedstr);
$toobj->print();

fobnn
C:5:"fobnn":23:{{"id":1,"name":"fobnn"}}
fobnn

When we use the custom serialization interface, our magic method is useless.

4. PHP dynamic types and PHP deserialization

Since the self-descriptive functionality mentioned above, the type of the object is saved in the serialization result, and php is a dynamically typed language, we can do a simple experiment.


class fobnn
{
 public $hack_id;
 public $hack_name;
 public function __construct($name,$id)
 {
  $this->hack_name = $name;
  $this->hack_id = $id;
 }
 public function print()
 {
  var_dump($this->hack_name);
 }
}
$obj = new fobnn('fobnn',1);
$obj->print();
$serializedstr = serialize($obj);
echo $serializedstr.PHP_EOL;;
$toobj = unserialize($serializedstr);
$toobj->print();
$toobj2 = unserialize("O:5:\"fobnn\":2:{s:7:\"hack_id\";i:1;s:9:\"hack_name\";i:12345;}");
$toobj2->print();

We modified hack_name deserialization to int type, i: 12345


string(5) "fobnn"
O:5:"fobnn":2:{s:7:"hack_id";i:1;s:9:"hack_name";s:5:"fobnn";}
string(5) "fobnn"
int(12345)

You can find that the object has been serialized successfully! And can work normally! Of course, this mechanism of php provides flexible syntax, but it also introduces security risks. We will continue to analyze the security problems caused by the serialization and deserialization features of php.

That's all we've compiled about the principles of PHP serialization and deserialization. Thank you for supporting this site.


Related articles: