Differences between strlen of and mb_strlen of in PHP

  • 2021-07-01 06:52:12
  • OfStack

Common string length functions in php are: strlen and mb_strlen. When the characters are all English characters, the two are the same. Here, we mainly compare the two calculation results when Chinese and English are mixed.

Let's look at an example first:


<?php 
// If the encoding method of the file during the test is UTF8 
$str=' Chinese a Word 1 Character '; 
echo strlen($str).'<br>';//14 
echo mb_strlen($str,'utf8').'<br>';//6 
echo mb_strlen($str,'gbk').'<br>';//8 
echo mb_strlen($str,'gb2312').'<br>';//10 
?>

Results analysis: In strlen calculation, the Chinese character of an UTF8 is 3 lengths, so the length of "Chinese a 1 character" is 3*4+2=14. In mb_strlen calculation, if the internal code is selected as UTF8, a Chinese character will be calculated as length 1, so the length of "Chinese a 1 character" is 6.

Using these two functions, we can jointly calculate the occupancy of a Chinese-English mixed string (the occupancy of a Chinese character is 2, and that of an English character is 1)


echo (strlen($str) + mb_strlen($str,'UTF8')) / 2;

For example, if the value of strlen ($str) of "Chinese a 1 character" is 14 and the value of mb_strlen ($str) is 6, the placeholder of "Chinese a 1 character" can be calculated as 10:


PHP built-in string length function strlen can not correctly handle Chinese strings, it only gets the number of bytes occupied by the string.

For Chinese coding of GB2312, the value obtained by strlen is twice the number of Chinese characters, while for Chinese coding of UTF-8, it is three times the difference (under UTF-8 coding, one Chinese character occupies three bytes).

This problem can be solved well by using mb_strlen function.

mb_strlen is used similarly to strlen, except that it has a second optional parameter that specifies the character encoding.

For example, to get the string $str length of UTF-8, you can use mb_strlen ($str, 'UTF-8'). If the second parameter is omitted, the internal encoding of PHP is used. The internal encoding can be obtained by the mb_internal_encoding () function.

Note: mb_strlen is not the core function of PHP. Ensure that php_mbstring. dll is loaded in php. ini before use
That is, make sure that the line "extension=php_mbstring. dll" exists and is not commented out, otherwise the problem of undefined functions will occur.


Related articles: