php automatically gets the string encoding function mb_detect_encoding

  • 2020-05-07 19:23:10
  • OfStack

When the mb_detect_encoding function is used for encoding recognition in php, many people have encountered the problem of identifying the wrong encoding, such as GB2312 and UTF-8, or UTF-8 and GBK(mainly for cp936), the Internet said that because of the short character is, mb_detect_encoding will be misjudged.
Such as:

 
$encode = mb_detect_encoding($keytitle, array("ASCII",'UTF-8 ' ,"GB2312 ' ,"GBK",'BIG5 ' )); 
if ($encode ==  " UTF-8 " ){ 
$keytitle = iconv("UTF-8 " ,"GBK",$keytitle); 
} 

The purpose of this code is to detect whether the string encoding is UTF-8, and if so, convert to GBK.
But when $keytitle = "%D0%BE%C6%AC"; At the right time. The test result is UTF-8. This bug is not bug and should not be too dependent on mb_detect_encoding when writing a program.
How to solve it? My solution is:
 
$encode = mb_detect_encoding($keytitle, array('ASCII','GB2312 ' ,'GBK','UTF-8'); 

The three parameters are: the detected input variable, the detection order of the encoding mode (1 denier is true, which is automatically ignored later), and strict mode
Adjust the order of code detection to put the maximum possibility in front, so as to reduce the chance of being incorrectly converted.
Generally, gb2312 should be arranged first, while GBK and UTF-8 should be arranged first.


Related articles: