Introduction to the Differences between String Length Functions strlen and mb_strlen

  • 2021-07-18 07:17:22
  • OfStack

Common string length functions in php are: strlen and mb_strlen. When the characters are all English characters, the two are the same. Here, we mainly compare the two calculation results when Chinese and English are mixed.

In PHP, strlen and mb_strlen are string length functions, but for some beginners, if you don't read the manual, you may not know the difference.
The following is an example to explain the difference between the two.

Look at the example first:


<?php 
// If the encoding method of the file during the test is UTF8 
$str=' Chinese a Word 1 Character '; 
echo strlen($str).'<br>';//14 
echo mb_strlen($str,'utf8').'<br>';//6 
echo mb_strlen($str,'gbk').'<br>';//8 
echo mb_strlen($str,'gb2312').'<br>';//10 
?>

Results analysis: In strlen calculation, the Chinese character of an UTF8 is 3 lengths, so the length of "Chinese a 1 character" is 3*4+2=14. In mb_strlen calculation, if the internal code is selected as UTF8, a Chinese character will be calculated as length 1, so the length of "Chinese a 1 character" is 6.

Using these two functions, we can jointly calculate the occupancy of a Chinese-English mixed string (the occupancy of a Chinese character is 2, and that of an English character is 1)
echo (strlen($str) + mb_strlen($str,'UTF8')) / 2;

For example, if the strlen ($str) value of "Chinese a word 1 symbol" is 14 and the mb_strlen ($str) value is 6, the placeholder of "Chinese a word 1 symbol" can be calculated to be 10.


echo mb_internal_encoding();

PHP's built-in string length function strlen does not handle Chinese strings correctly, it only gets the number of bytes the string occupies. For Chinese coding of GB2312, the value obtained by strlen is twice the number of Chinese characters, while for Chinese coding of UTF-8, it is three times the difference (under UTF-8 coding, one Chinese character occupies three bytes).

This problem can be solved well by using mb_strlen function. The usage of mb_strlen is similar to that of strlen, except that it has a second optional parameter that specifies the character encoding. For example, to get the string $str length of UTF-8, you can use mb_strlen ($str, 'UTF-8'). If the second parameter is omitted, the internal encoding of PHP is used. The inner encoding can be obtained by the mb_internal_encoding () function.

It should be noted that mb_strlen is not the core function of PHP. Before using it, it is necessary to ensure that php_mbstring. dll is loaded in php. ini, that is, to ensure that the line "extension=php_mbstring. dll" exists and has not been commented out, otherwise the problem of undefined function will occur.


Related articles: