Why do you say that PHP reference is a pit and should be used with caution

  • 2021-09-16 06:26:06
  • OfStack

Preface

Last year, I attended many meetings, and I made relevant speeches in 8 meetings. Among them, I talked about the citation of PHP many times, because many people misunderstood it. Before digging into this issue, let's review the basic concepts of reference in 1 and clarify what "reference passing" is.

Reference in PHP means accessing the contents of the same variable by a different name. No matter which name you use to operate on the variable, the contents accessed by other names will also change.

Let's deepen our understanding of this through code. First, let's write a few simple statements, assign one variable to another, and change the other variable:


<?php
$a = 23;
$b = $a;
$b = 42;
var_dump($a); // int(23)
var_dump($b); // int(42)

This script shows that the value of $a is still 23, while $b equals 42. The reason for this is that we got a copy (what happened later...) Now we use references to do the same thing:


<?php
$a = 23;
$b = &$a;
$b = 42;
var_dump($a); // int(42)
var_dump($b); // int(42)
?>

Now the value of $a has changed to 42. In fact, there is no difference between $a and $b, both of which use the same variable container (aka: zval). The only way to separate the two is to use the unset () function to destroy any one of them.

In PHP, references can be used not only in ordinary statements, but also in function parameters and return values:


<?php
function &foo(&$param) {
 $param = 42;
 return $param;
}

$a = 23;
echo "\$a before calling foo(): $a\n";
$b = foo($a);
echo "\$a after the call to foo(): $a\n";
$b = 23;
echo "\$a after touching the returned variable: $a\n";
?>

What do you think the above results are? -Yes, like this:


$a before calling foo(): 23
$a after the call to foo(): 42
$a after touching the returned variable: 42

Here we initialize a variable and pass it to a function as a reference parameter. The function changes it, and it has a new value. This function returns the same variable, and we changed the returned variable and its original value. . . Wait! It hasn't changed, has it! ? -Yes, that's what can be quoted. Here's what happens: The function returns a reference to the variable container zval of $a and creates a copy of it through the = assignment operator.

To fix this problem, we need to add an additional & Operator:


$b = &foo($a);

The result is the same as what we expected:


$a before calling foo(): 23
$a after the call to foo(): 42
$a after touching the returned value: 23

Summary 1: The references to PHP are aliases for the same variables, and it may be difficult to use them correctly. For more information about reference counting, here is a basic information, please refer to the Basic Knowledge of Reference Counting in the manual.

The biggest change when PHP 5 was released was "object handling". 1 We understand it as:

In PHP 4, objects are treated as variables, so when objects are passed as function parameters, they are copied. But in PHP 5, they are always "reference references".

The above understanding is not entirely correct. Its main purpose is to follow the "object-facing pattern": after an object is passed to a function or method, the function sends an instruction to the object (for example, a method is called) to change the state of the object (for example, the attribute of the object). Therefore, the objects passed in must be the same one. Object-facing users in PHP 4 use "reference referencing" to solve this problem, but it is difficult to be perfect. PHP 5 introduces "object storage" independent of variable containers. When an object is assigned to a variable, the variable no longer stores the whole object (property sheet and other "class" information), but stores a reference to the memory where the object is located-when we copy an object variable, we copy this "memory reference". This can easily be misunderstood as "reference", but "reference to memory" and "reference" are completely different concepts. The following sample code helps us distinguish better:


<?php
//  Create 1 Objects and reference variables for this object 
$a = new stdclass;
$b = $a;
$c = &$a;

//  Operate on "Object" 
$a->foo = 42; 
var_dump($a->foo); // int(42)
var_dump($b->foo); // int(42)
var_dump($c->foo); // int(42)

//  Now change the type of variable directly 
$a = 42;
var_dump($a); // int(42)
var_dump($b); // object(stdClass)#1719 (1) {
    //   ["foo"]=>
    //   int(42)
    // }
var_dump($c); // int(42)
?>

In the above code, modifying the object's properties affects the copied variable $b and the referenced variable $c. But in the last block of code, when we modify the type of $a, the referenced $c changes, while the copied variable $b does not change, which is what most engineers with object-facing experience expect.

So, facing the object is the only reason to use "reference", but now that PHP 4 is dead, you can give up this usage.

Another reason why people use "reference" is that it will make the code faster. But this is wrong. Reference doesn't make your code execute faster. Worse, many times "reference" will make your code execute less efficiently.

I must stress once more: Yes, many times "quoting" will make your code execute less efficiently.

Engineers of other languages, who read the coding specifications of other languages, will see suggestions for using pointers to reduce memory consumption and improve operational efficiency when dealing with large data structures or strings. These engineers mistakenly understand this concept to "reference", but "pointer" and "reference" are completely different technical models. The PHP parser differs from other languages, and in PHP we use the "copy-at-write (copy-on-write)" model.

In the "copy-on-write" model, assignment and function parameter transfer will not trigger the copy action. You can understand that multiple different variables point to the same "variable container", and only when the "write" action occurs will the copy action be triggered. This means that even if a variable looks like a "copy", it is not in essence. Therefore, when passing a huge variable to a function, it will not have much impact on performance. However, if you use reference referencing at this time, the reference referencing will turn off the "copy-at-write" mechanism, which will cause the next variable referencing without reference to be copied immediately. This is not the end of the world. You can quote it everywhere. This is not the case: the internal mechanism of PHP relies on the "copy-on-write" model, and there are many internal function parameters that you can't modify.
I've seen something like this somewhere:


<?php
function foo(&$data) {
 for ($i = 0; $i < strlen($data); $i++) {
  do_something($data{$i});
 }
}

$string = "... looooong string with lots of data .....";
foo(string);
?>

Obviously, the first problem with the above code is to call strlen () in a loop instead of using the calculated length. That is to say, it is OK to call strlen ($data) once, but it has been called many times. Unlike languages such as C, 1 generally speaking, PHP strings have their own length, so there is no need to calculate the length. So as far as strlen () is concerned, this is not too bad. But now another problem is that in order to save time, the developer in this case passed a reference as an argument to show his cleverness. However, what strlen () expects is a copy. "Copy at write" cannot be used for reference, so $data will be copied when strlen () is called, and strlen () will do an absolutely simple operation-in fact, strlen () is one of the simplest functions in PHP-and then the copy will be destroyed directly.

If references are not used, there is no need to copy and code execution will be faster. And even if strlen () supports references, you won't get any more benefits from it.

In general:

Avoid using references in object-oriented (OO) except for the legacy of PHP 4. Do not use references to improve performance.

The third problem with using references to get things done is the poor API design caused by returning data through references to parameters. This problem is also caused by the developer's failure to realize that "PHP is PHP and not any other language".

In PHP, the same function can return different data types. Thus, you can return a string on success and a Boolean value on failure false, which also allows for complex structural types such as arrays and objects. So when you need to return a lot of things, you can pack them in one. In addition, exceptions are also a way for functions to return.

Using references is a bad thing. In addition to the fact that references are not good in themselves and can also degrade performance, using references in this way can make code difficult to maintain. A function call like the following code:


do_something($var);

Do you want $var to change? Of course not. However, if the parameter passed by do_something () is a reference, it may change.

Another problem with this type of API is that functions cannot be called in chains, so you will always encounter scenarios where temporary variables must be used. Chain invocation may make it less readable, but in many scenarios, chain invocation makes the code more concise.

My personal favorite example of bad design decisions about references is the sort () function that comes with PHP. sort () takes an array as a reference parameter and returns an ordered array by reference. It may be better to return an ordered array by value as usual. Of course, this is done for historical reasons: sort () appeared before "copy at write". "Copy-at-write" originated in PHP4, and sort () is even earlier, and it existed as early as PHP was a convenient thing to do on Web, rather than really becoming its own language.

In a word: in PHP, reference is not good. Do not use references. They will only cause trouble. Besides, don't hope to use references to upgrade the engine.

Summarize


Related articles: