strlen and mb_strlen functions are different in PHP

  • 2021-01-25 07:13:52
  • OfStack

There are two functions in PHP that count the number of strings
1 is strlen,1 is mb_strlen;
Let's look at the definitions in the manual first
strlen
strlen -- Gets the string length
int strlen ( string $string )
Returns the length of the given string string.

mb_strlen
int mb_strlen ( string $str [, string $encoding ] )
Returns the length of the given string string.
The encoding parameter is the character encoding. If omitted, internal character encoding is used.

mb_strlen can pass 1 character encoding. Here is an example of the difference between the two.
Let's start with an example:

 
<?php    // When you test the file is encoded like this UTF8    
$str=' Chinese a word 1 operator ';    
echo strlen($str).'<br>';//14    
echo mb_strlen($str,'utf8').'<br>';//6    
echo mb_strlen($str,'gbk').'<br>';//8    
echo mb_strlen($str,'gb2312').'<br>';//10    
?> 


Result analysis: in the calculation of strlen, the length of a Chinese character of UTF8 is 3*4+2=14, so the length of "Chinese a character 1" is 3*4+2=14. In the calculation of mb_strlen, the internal code UTF8 is selected, and one Chinese character will be calculated as the length 1, so the length of "Chinese a character 1" is 6.
These two functions can be used to jointly calculate the number of placeholders for a mixed Chinese and English string (a Chinese character has a placeholder of 2 and an English character has a placeholder of 1).
echo (strlen($str) + mb_strlen($str,'UTF8')) / 2;  


For example, the value of strlen($str) of "Chinese a character 1" is 14, and the value of mb_strlen($str) is 6, then we can calculate that the space occupation of "Chinese a character 1" is 10.
echo mb_internal_encoding(); 


The built-in string length function strlen does not handle Chinese strings correctly. It only gets the number of bytes the string occupies. For GB2312, the value of strlen is 2 times the number of Chinese characters, while for UTF-8, the difference is 3 times (in UTF-8, 1 Chinese character is 3 bytes).

The mb_strlen function can be used to solve this problem. mb_strlen is used in a similar way to strlen, except that it has an optional second argument to specify the character encoding. For example, to get the length of the string $str for UTF-8, use mb_strlen($str,' UTF-8 '). If the second parameter is omitted, the internal encoding of PHP is used. The internal encoding can be obtained by using the mb_internal_encoding() function.
It is important to note that mb_strlen is not a core PHP function. Before using Windows, it is necessary to ensure that the line "extension=php_mbstring.dll" exists and is not annotated out. Otherwise, the problem of undefined function will occur. This extension is required to compile under Linux.


Related articles: