PHP string length calculation strlen of function usage introduction

  • 2020-10-07 18:37:23
  • OfStack

The strlen() function and the mb_strlen() function

In PHP, the function strlen() returns the length of the string. The function prototype is as follows:
 
int strlen(string string_input); 

The parameter string_input is the string to be processed.

The strlen() function returns the byte length of a string, 1 byte for each letter, number, and various symbols, all of which are of length 1. A noon character takes up two bytes, so the length of a noon character is 2. For example,
 
<?php 
echo strlen("www.sunchis.com"); 
echo strlen("3 Know the development network "); 
?> 

"echo strlen (" www. sunchis. com");" The operation result of: 15

"echo strlen("3 Development Network ");" The operation result of: 15

Here is a question, isn't one Chinese character 2 bytes? "3 knowledge development network" is clearly five Chinese characters, how can the result of operation is 15?

Here's why: When strlen() calculates an UTF-8 Chinese character, it treats it as a length of 3. When Chinese and English are mixed, how to calculate the length of the string exactly? Here, another function, mb_strlen(), has to be introduced. The mb_strlen() function is almost exactly the same as strlen(), only with one more argument to the specified character set encoding. The function prototype is:
 
int mb_strlen(string string_input, string encode); 

PHP's built-in string length function, strlen, doesn't handle Chinese strings properly; it just gets the number of bytes in the string. For the Chinese encoding of GB2312, the value obtained by strlen is twice the number of Chinese characters, while for the Chinese encoding of UTF-8, it is three times the difference (under the UTF-8 encoding, one Chinese character accounts for three bytes). Therefore, the following code can accurately calculate the length of the Chinese string:
 
<?php 
$str = "3 know sunchis Development of network "; 
echo strlen($str)."<br>"; // Results: 22 
echo mb_strlen($str,"UTF8")."<br>"; // Results: 12 
$strlen = (strlen($str)+mb_strlen($str,"UTF8"))/2; 
echo $strlen; // Results: 17 
?> 

Principle analysis:

When CALCULATING strlen(), the Length of Chinese character for ES53en-8 is 3, so the length of "3 knows sunchis development network" is 5×3+7×1=22
In the calculation of mb_strlen, when the inner code is selected as UTF8, one Chinese character will be calculated as length 1, so the length of "3 knows sunchis development network" is 5×1+7×1=12

All that remains is pure mathematics, and without further ado...

Note: for mb_strlen($str,' UTF-8 '), if the second parameter is omitted, the internal encoding of PHP is used. The internal code can be obtained from the mb_internal_encoding() function. It should be noted that mb_strlen is not the core function of PHP. Before using it, we need to make sure that ES77en_mbstring.dll is loaded in php.ini, that is, make sure that the line "extension= php_mbstring.dll" exists and is not commented out, otherwise there will be the problem of undefined function.

Related articles: