gb2312 converted to utf8 for php character encoding conversion

  • 2020-10-23 20:04:14
  • OfStack

In php, we usually use iconv and mb_convert_encoding for character encoding conversion, but mb_convert_encoding is much worse than iconv in terms of conversion performance.
string iconv (string in_charset, string out_charset, string str) Note: For the second parameter, in addition to specifying the code to be converted to, you can add two suffixes: //TRANSLIT and //IGNORE, where //TRANSLIT will automatically convert characters that cannot be converted directly into one or more approximate characters, //IGNORE will ignore characters that cannot be converted, and the default effect is truncated from the first illegal character.
Returns the converted string or FALSE on failure.
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
Need enable mbstring extension library first, in ES32en. ini will be; extension= php_mbstring. dll; To get rid of
mb_convert_encoding can specify a variety of input encodings, which will be automatically recognized based on the content, but the execution efficiency is much lower than iconv;

Use:
It was found that iconv made an error converting the character "-" to gb2312. Without the ignore parameter, all strings following the character cannot be saved. Either way, the "--" won't translate and won't output. In addition mb_convert_encoding does not have this bug.
1 Generally, iconv is used. Only when the original encoding cannot be determined, or when iconv cannot be properly displayed after conversion, mb_convert_encoding is used.


/** 
* Automatic judgment gbk or gb2312 Converts the encoded string to utf8 
* Can automatically determine the encoding class of the input string if itself is utf-8 I don't have to convert it, otherwise I just convert it utf-8 The string  
* The supported character encoding types are: utf-8,gbk,gb2312 
*@$str:string  string  
*/ 
function yang_gbk2utf8($str){ 
    $charset = mb_detect_encoding()($str,array('UTF-8','GBK','GB2312')); 
    $charset = strtolower($charset); 
    if('cp936' == $charset){ 
        $charset='GBK'; 
    } 
    if("utf-8" != $charset){ 
        $str = iconv($charset,"UTF-8//IGNORE",$str); 
    } 
    return $str; 
}

Next I look at some of the problems in converting character encodings
With mb_detect_encoding ($str); The extension= ES69en_mbstring.dll extension must be opened to use php

<?php
$str=" test ing";
$cha=mb_detect_encoding($str);
$s = iconv($cha,"UTF-8",$str);
var_dump($s);
?> 

Result return:
string (0) ""
It's strange why this should be.

<?php
$str=" test ing";
$cha=mb_detect_encoding($str);
$s = iconv("GB2312","UTF-8",$str);
var_dump($s);
?>

The return result is correct. Find the function mb_detect_encoding($str); It's still not accurate. I don't know why.
string mb_convert_encoding (string $str, string $to_encoding [, mixed $from_encoding])
Can be converted to the specified encoding string, I wrote the example

<pre lang="php" line="1">
<?php
$a=" I'm fine ";
echo mb_convert_encoding ($a,'UTF-8');
?> 

The result:
?? A brand? A brand?
Now the question is if I convert the different string encoding form 1 to utf-8, if I know the change in advance I can use iconv, but what if I don't know the encoding?
Problem 3: iconv problem, if the converted string, the first byte encoding is greater than 1 fixed number will return null.
Such as:

<?php
$str=chr(254)." test ing".chr(254);
$s = iconv("GB2312","UTF-8",$str);
var_dump($s);
?> 

return
string (0) ""

mb_convert_encoding See the official usage of mb_ES118en:

http://cn.php.net/manual/en/function.mb-convert-encoding.php

Another function in PHP, iconv, is also used to convert string encodings, similar to the above function.

Here are some more detailed examples:
iconv - Convert string requested character encoding
(PHP 4 > = 4.0.5, PHP 5)
mb_convert_encoding -- Convert character encoding
(PHP 4 > = 4.0.6, PHP 5)
Usage:
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
enable mbstring extension library, php.ini. extension=php_mbstring. To get rid of
mb_convert_encoding can specify a variety of input codes, which will be automatically recognized based on the content, but the execution efficiency is much lower than iconv;
string iconv ( string in_charset, string out_charset, string str )
Note: For the second parameter, in addition to specifying the code to be converted to, you can add two suffixes: //TRANSLIT and //IGNORE, where //TRANSLIT will automatically convert characters that cannot be converted directly into one or more approximate characters, //IGNORE will ignore characters that cannot be converted, and the default effect is truncated from the first illegal character.
Returns the converted string or FALSE on failure.
Use:
An iconv error was found converting the character "-" to gb2312. Without the ignore argument, all strings following the character cannot be saved. Either way, the "--" won't translate and won't output. And mb_convert_encoding doesn't have this bug.
1 iconv is generally used, but only when the original encoding cannot be determined, or when iconv cannot be properly displayed after conversion, mb_convert_encoding is used.
from_encoding is specified code name before conversion. it be array or string the following text specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str, "UCS-2ES233en", "JIS, ES235en-ES236en, ES237en-ES238en");
/* "auto" is to "ASCII,JIS, ES246en-8, ES247en-ES248en,SJIS" */
$str = mb_convert_encoding($str, "ES256en-ES257en", "auto");
Example:


<?php  
 $content = iconv("GBK", "UTF-8", $content);  
 $content = mb_convert_encoding($content, "UTF-8", "GBK");  
?>

This can be converted based on the character encoding of the input and output

<?php
function phpcharset($data, $to) {
 if(is_array($data)) {
  foreach($data as $key => $val) {
   $data[$key] = phpcharset($val, $to);
  }
 } else {
  $encode_array = array('ASCII', 'UTF-8', 'GBK', 'GB2312', 'BIG5');
  $encoded = mb_detect_encoding($data, $encode_array);
  $to = strtoupper($to);
  if($encoded != $to) {
   $data = mb_convert_encoding($data, $to, $encoded);
  }
 }
 return $data;
}
?>


Related articles: