In depth understanding of the new zval container and reference counting mechanism in PHP7

  • 2021-11-10 08:59:28
  • OfStack

When I was looking up PHP 7 garbage collection data recently, I was puzzled by the fact that some of the code samples on the Internet had different results when running in a local environment. After careful consideration, it is not difficult to find the problem: most of these articles are from the era of PHP5.x, and after the release of PHP7, the new zval structure has been adopted, and the relevant data are relatively poor. Therefore, I made a summary based on some data, mainly focusing on explaining the reference counting mechanism in the new zval container. If there are any fallacies, I hope to give your advice.

New zval Structure in PHP7

People don't say dark words, look at the code first!


struct _zval_struct {
 union {
 zend_long     lval;       /* long value */
 double      dval;       /* double value */
 zend_refcounted *counted;
 zend_string   *str;
 zend_array    *arr;
 zend_object   *obj;
 zend_resource  *res;
 zend_reference  *ref;
 zend_ast_ref   *ast;
 zval       *zv;
 void       *ptr;
 zend_class_entry *ce;
 zend_function  *func;
 struct {
  uint32_t w1;
  uint32_t w2;
 } ww;
 } value;
  union {
    struct {
      ZEND_ENDIAN_LOHI_4(
        zend_uchar  type,     /* active type */
        zend_uchar  type_flags,
        zend_uchar  const_flags,
        zend_uchar  reserved)   /* call info for EX(This) */
    } v;
    uint32_t type_info;
  } u1;
  union {
    uint32_t   var_flags;
    uint32_t   next;         /* hash collision chain */
    uint32_t   cache_slot;      /* literal cache slot */
    uint32_t   lineno;        /* line number (for ast nodes) */
    uint32_t   num_args;       /* arguments number for EX(This) */
    uint32_t   fe_pos;        /* foreach position */
    uint32_t   fe_iter_idx;     /* foreach iterator index */
  } u2;
};

For a detailed description of this structure, please refer to the article of Brother Bird at the end of the article, which is very detailed, so I won't play broadsword in front of Guan Gong. Here I only put forward a few key points:

The variable in PHP7 is divided into two parts: variable name and variable value, which correspond to zval_struct and value declared in it respectively zend_long and double in zval_struct. value are simple data types, which can directly store specific values, while other complex data types store a pointer to other data structures In PHP7, the reference counter is stored in value instead of zval_struct NULL and Boolean are data types that have no value (where Boolean is marked by two constants IS_FALSE and IS_TRUE), so there is no reference count Reference (REFERENCE) becomes a data structure instead of just a tag bit, and its structure is as follows:

struct _zend_reference {
  zend_refcounted_h gc;
  zval       val;
}

6. zend_reference As zval_struct Also has its own val value, which refers to 1 value type contained in zval_struct.value Of. They all have their own reference counters.

The reference counter is used to record how many zval are currently pointing to the same zend_value.

For point 6, look at the following code:


$a = 'foo';
$b = &$a;
$c = $a;

The data structure at this point is as follows:

$a and $b each have one zval_struct container, and each value points to the same one zend_reference Structure, zend_reference An val structure is embedded, pointing to the same zend_string, in which the contents of the string are stored.

And $c also has an zval_struct, and its value can directly point to the above-mentioned zend_string when initialized, so that no copy will occur when copying.

Let's talk about the phenomena that will appear in this brand-new zval structure and the reasons behind these phenomena.

Problem

1. Why do some variables have reference counters with an initial value of 0

Phenomenon


$var_int = 233;
$var_float = 233.3;
$var_str = '233';
xdebug_debug_zval('var_int');
xdebug_debug_zval('var_float');
xdebug_debug_zval('var_str');
/**  Output  **
var_int:
(refcount=0, is_ref=0)int 233
var_float:
(refcount=0, is_ref=0)float 233.3
var_str:
(refcount=0, is_ref=0)string '233' (length=3)
**********/

Cause

In PHP7, when assigning a value to a variable, there are two operations:

Apply 1 zval_struct structure for symbolic quantity (that is, variable name) Save the values of variables in zval_struct. value. For the values that zval can hold in the value field, they will not be referenced and counted, but will be assigned directly at the time of copying. These types are:
IS_LONG IS_DOUBLE

That is, our shaping and floating point type in PHP.

Then why is refcount of var_str also 0?

This involves two types of strings in PHP:

1. interned string internal string (function name, class name, variable name, static string):

$str = '233';    // 静态字符串

2. Normal string:

$str = '233' . time();

For internal strings, the content of strings is only 1 unchanged, which is equivalent to the strings defined in the static variable area in C language. Their life cycle exists during the whole request period. After request is completed, they will be destroyed and released, and naturally there is no need for memory management through reference counting.

2. Why does the value of the counter change directly to 2 when assigning reference values to shaped, floating-point, and static string variables

Phenomenon


$var_int_1 = 233;
$var_int_2 = &var_int;
xdebug_debug_zval('var_int_1');
/**  Output  **
var_int:
(refcount=2, is_ref=1)int 233
**********/

Cause

Recall the data structure of value in zval_struct we talked about at the beginning of 1. When a variable is assigned a value of shaping, floating point type or static string type, the data type of value is zend_long, double or zend_string, and the value can be directly stored in value. When copying by value, a new zval_struct will be created to store the value in the same way in value of the same data type, so the value 1 of refcount will always be 0.

But when using & Operator, the situation is different:

PHP is & The variable operated by the operator applies for 1 zend_reference structure Point zend_reference. value to the original zval_struct. value The data type of zval_struct. value is changed to zend_refrence Point zval_struct. value to zend_reference just requested and initialized Apply the zval_struct structure for the new variable, pointing his value to the newly created zend_reference

At this point: var_int_2 both have an zval_struct structure, and their zval_struct. value all point to the same zend_reference structure, so the reference counter for that structure has a value of 2.

Digression: zend_reference again points to a shaped or floating-point value. If the pointing value type is zend_string, then the value of the value reference counter is 1. And refcount from xdebug shows the counter value of zend_reference (i.e. 2)

3. Why does the reference counter of the initial array have a value of 2

Phenomenon


$var_empty_arr = [1, 2, '3'];
xdebug_debug_zval('var_empty_arr');
/**  Output  **
var_arr:
(refcount=3, is_ref=0)
array (size=3)
 0 => (refcount=0, is_ref=0)int 1
 1 => (refcount=0, is_ref=0)int 2
 2 => (refcount=1, is_ref=0)string '3' (length=1)
**********/

Cause

This involves another concept in PHP 7, called immutable array (immutable arrays). I'll cover immutable array in detail in the next article, and here we just need to know that arrays defined in this way are called immutable arrays.

For arrays the not-refcounted variant is called an "immutable array". If you use opcache, then constant array literals in your code will be converted into immutable arrays. Once again, these live in shared memory and as such must not use refcounting. Immutable arrays have a dummy refcount of 2, as it allows us to optimize certain separation paths.

Invariant arrays, like the inner string 1 we mentioned above, do not use reference counting, but the difference is that the count value of the inner string is always 0, while the invariant array uses a pseudo count value of 2.

Summarize


Related articles: