Detailed explanation of the changes brought by the abstract syntax tree of AST of the new feature of PHP7

  • 2021-10-24 19:16:58
  • OfStack

This article analyzes the changes brought about by the new feature of PHP7, the Abstract Syntax Tree (AST). Share it for your reference, as follows:

Much of this is based on the RFC documentation for AST: https://wiki. php. net/rfc/abstractsyntaxtree, excerpted from the source documentation for ease of understanding.

This article will not tell you what an abstract syntax tree is, which you need to know for yourself. Here is only a description of some changes brought by AST to PHP.

New implementation process

One important change in the kernel of PHP7 is the addition of AST. In PHP5, the process from php script to opcodes execution is:

Lexing: Lexical scan analysis, converting source files into token streams; Parsing: Parsing, where op arrays is generated.

In PHP7, op arrays is no longer directly generated in the parsing stage, but AST is generated first, so there is one more step in the process:

Lexing: Lexical scan analysis, converting source files into token streams; Parsing: Syntax parsing, generating an abstract syntax tree from an token stream; Compilation: Generates op arrays from an abstract syntax tree.

Execution time and memory consumption

From the above steps, this is one step more than the previous process, so it is common sense that this will increase the execution time and memory usage of the program. But in fact, the memory usage did increase, but the execution time decreased.

The following results were tested using small (about 100 lines of code), medium (about 700 lines) and large (about 2800 lines) scripts: https://gist. github. com/nikic/289b0c7538b46c2220bc.

Execution time of 100 compiles per file (note that the test result time of the article is 14 years, when PHP7 is also called PHP-NG):

php-ng php-ast diff
SMALL 0.180s 0.160s -12.5%
MEDIUM 1.492s 1.268s -17.7%
LARGE 6.703s 5.736s -16.9%

Peak memory in a single compilation:

php-ng php-ast diff
SMALL 378kB 414kB +9.5%
MEDIUM 507kB 643kB +26.8%
LARGE 1084kB 1857kB +71.3%

The test results of a single compilation may not be representative of the actual use. The following are the results of a complete project test using PhpParser:

php-ng php-ast diff
TIME 25.5ms 22.8ms -11.8%
MEMORY 2360kB 2482kB +5.1%

Tests show that after using AST, the execution time of the program is about 10% to 15% improved as a whole, but the memory consumption also increases, which is obvious in the single compilation of large files, but it is not a serious problem in the whole project execution process.

It is also noted that the above results are all in the absence of Opcache, and the increase in memory consumption is not a big problem when Opcache is turned on in the production environment.

A semantic change

If it is only a time optimization, it seems that it is not a sufficient reason to use AST. In fact, the implementation of AST is not based on the consideration of time optimization, but to solve syntax problems. Let's look at some semantic changes.

yield does not require parentheses

In the implementation of PHP5, if you use it in a 1 expression context (for example, on the right side of a 1 assignment expression) yield , you must be in yield Use brackets around the declaration:


<?php
$result = yield fn(); //  Illegal 
$result = (yield fn()); //  Legal 

This behavior is only due to the limitations of the implementation of PHP5, and in PHP7, parentheses are no longer necessary. Therefore, the following writing is also legal:


<?php
$result = yield;
$result = yield $v;
$result = yield $k => $v;

Of course, we have to follow yield The application scenario.

Brackets do not affect behavior

In PHP5, ($foo)['bar'] = 'baz' And $foo['bar'] = 'baz' The two statements have different meanings. In fact, the first one is illegal, and you will get the following mistakes:


<?php
($foo)['bar'] = 'baz';
# PHP Parse error: Syntax error, unexpected '[' on line 1

However, in PHP 7, the two ways of writing mean the same thing.

Similarly, if the parameters of the function are wrapped in parentheses, there is a problem with type checking, which is also solved in PHP7:


<?php
function func() {
 return [];
}
function byRef(array &$a) {
}
byRef((func()));

The above code does not alarm in PHP5 unless you use the byRef(func()) But in PHP7, regardless of the func() Whether there are parentheses on both sides will cause the following error:

PHP Strict standards: Only variables should be passed by reference ...

Changes of list ()

The behavior of the list keyword has changed a lot. list the order in which variables are assigned (the order in which the equal sign is left and right at the same time) was from right to left, but now it is from left to right:


<?php
list($array[], $array[], $array[]) = [1, 2, 3];
var_dump($array);
// PHP5: $array = [3, 2, 1]
// PHP7: $array = [1, 2, 3]
#  Note that the left-right order here refers to the order of the equal sign at the same time, 
# list($a, $b) = [1, 2]  In this use  $a == 1, $b == 2  There is no doubt about it. 

The reason for the above change is that in the PHP5 assignment process, 3 will be filled into the array first, and 1 will be filled last, but now the order has changed.

The same changes are:


<?php
$a = [1, 2];
list($a, $b) = $a;
// PHP5: $a = 1, $b = 2
// PHP7: $a = 1, $b = null + "Undefined index 1"

This is because in previous assignments $b first got 2 and then the value of $a became 1, but now $a becomes 1 and is no longer an array, so $b becomes null.

list now accesses each offset only once:


<?php
list(list($a, $b)) = $array;
// PHP5:
$b = $array[0][1];
$a = $array[0][0];
// PHP7:
//  Will produce 1 Intermediate variables, resulting in  $array[0]  Value of 
$_tmp = $array[0];
$a = $_tmp[0];
$b = $_tmp[1];

Empty list members are now completely banned, previously only under certain circumstances:


<?php
list() = $a;   //  Illegal 
list($b, list()) = $a; //  Illegal 
foreach ($a as list()) //  Illegal  (PHP5  It is also illegal in )

Order of reference assignment

The order of reference assignments is right to left in PHP5 and left to right in the present tense:


<?php
$obj = new stdClass;
$obj->a = &$obj->b;
$obj->b = 1;
var_dump($obj);
// PHP5:
object(stdClass)#1 (2) {
 ["b"] => &int(1)
 ["a"] => &int(1)
}
// PHP7:
object(stdClass)#1 (2) {
 ["a"] => &int(1)
 ["b"] => &int(1)
}

The __clone method can be called directly

You can now use it directly $obj->__clone() To call the __clone Method. __clone It is the only 11 magic methods that are forbidden to call directly before, and you will get one such error before:

Fatal error: Cannot call __clone() method on objects - use 'clone $obj' instead in ...

Variable grammar 1-character

AST also addresses a number of grammatical uniformity issues that were raised in another RFC: https://wiki.php.net/rfc/uniform_variable_syntax.

In the new implementation, the meaning expressed by the previous 1 syntax is somewhat different from that of the present. For details, please refer to the following table:

Expression PHP5 PHP7
$$foo['bar']['baz'] ${$foo['bar']['baz']} ($$foo)['bar']['baz']
$foo->$bar['baz'] $foo->{$bar['baz']} ($foo->$bar)['baz']
$foo->$bar['baz']() $foo->{$bar['baz']}() ($foo->$bar)['baz']()
Foo::$bar['baz']() Foo::{$bar['baz']}() (Foo::$bar)['baz']()

On the whole, the order is from right to left, now from left to right, and the principle that brackets do not affect behavior is also followed. These complicated variable writing methods need attention in actual development.

For more readers interested in PHP related content, please check the topics on this site: "PHP Extension Development Tutorial", "php Object-Oriented Programming Introduction Tutorial", "php+mysql Database Operation Introduction Tutorial", "PHP Network Programming Skills Summary" and "php Common Database Operation Skills Summary"

I hope this article is helpful to everyone's PHP programming.


Related articles: