Analysis on the difference between strlen and mb_strlen in PHP

  • 2021-07-16 02:02:42
  • OfStack

In PHP, strlen and mb_strlen are string length functions, but for some beginners, if you don't read the manual, you may not know the difference.
The following is an example to explain the difference between the two.

Look at the example first:


<?php 
// If the encoding method of the file during the test is UTF8 
$str=' Chinese a Word 1 Character '; 
echo strlen($str).'<br>';//14 
echo mb_strlen($str,'utf8').'<br>';//6 
echo mb_strlen($str,'gbk').'<br>';//8 
echo mb_strlen($str,'gb2312').'<br>';//10 
?> 

Results analysis: In strlen calculation, the Chinese character of an UTF8 is 3 lengths, so the length of "Chinese a 1 character" is 3*4+2=14. In mb_strlen calculation, if the internal code is selected as UTF8, a Chinese character will be calculated as length 1, so the length of "Chinese a 1 character" is 6.

Using these two functions, we can jointly calculate the occupancy of a Chinese-English mixed string (the occupancy of a Chinese character is 2, and that of an English character is 1)


echo (strlen($str) + mb_strlen($str,'UTF8')) / 2; 

For example, if the value of strlen ($str) of "Chinese a 1 character" is 14 and the value of mb_strlen ($str) is 6, the placeholder of "Chinese a 1 character" can be calculated to be 10.


echo mb_internal_encoding(); 

PHP built-in string length function strlen does not correctly handle Chinese strings, it only gets the number of bytes the string occupies. For Chinese coding of GB2312, the value obtained by strlen is twice the number of Chinese characters, while for Chinese coding of UTF-8, it is three times the difference (under UTF-8 coding, one Chinese character occupies three bytes).

This problem can be solved well by using mb_strlen function. mb_strlen is used similarly to strlen, except that it has a second optional parameter that specifies the character encoding. For example, to get the string $str length of UTF-8, you can use mb_strlen ($str, 'UTF-8'). If the second parameter is omitted, the internal encoding of PHP is used. The inner encoding can be obtained by the mb_internal_encoding () function.

It should be noted that mb_strlen is not the core function of PHP. Before using it, it is necessary to ensure that php_mbstring. dll is loaded in php. ini, that is, to ensure that the line "extension=php_mbstring. dll" exists and has not been commented out, otherwise the problem of undefined function will occur.


Related articles: