gb2312 converted to utf8 for php character encoding conversion
- 2020-10-23 20:04:14
- OfStack
In php, we usually use iconv and mb_convert_encoding for character encoding conversion, but mb_convert_encoding is much worse than iconv in terms of conversion performance.
string iconv (string in_charset, string out_charset, string str) Note: For the second parameter, in addition to specifying the code to be converted to, you can add two suffixes: //TRANSLIT and //IGNORE, where //TRANSLIT will automatically convert characters that cannot be converted directly into one or more approximate characters, //IGNORE will ignore characters that cannot be converted, and the default effect is truncated from the first illegal character.
Returns the converted string or FALSE on failure.
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
Need enable mbstring extension library first, in ES32en. ini will be; extension= php_mbstring. dll; To get rid of
mb_convert_encoding can specify a variety of input encodings, which will be automatically recognized based on the content, but the execution efficiency is much lower than iconv;
Use:
It was found that iconv made an error converting the character "-" to gb2312. Without the ignore parameter, all strings following the character cannot be saved. Either way, the "--" won't translate and won't output. In addition mb_convert_encoding does not have this bug.
1 Generally, iconv is used. Only when the original encoding cannot be determined, or when iconv cannot be properly displayed after conversion, mb_convert_encoding is used.
/**
* Automatic judgment gbk or gb2312 Converts the encoded string to utf8
* Can automatically determine the encoding class of the input string if itself is utf-8 I don't have to convert it, otherwise I just convert it utf-8 The string
* The supported character encoding types are: utf-8,gbk,gb2312
*@$str:string string
*/
function yang_gbk2utf8($str){
$charset = mb_detect_encoding()($str,array('UTF-8','GBK','GB2312'));
$charset = strtolower($charset);
if('cp936' == $charset){
$charset='GBK';
}
if("utf-8" != $charset){
$str = iconv($charset,"UTF-8//IGNORE",$str);
}
return $str;
}
Next I look at some of the problems in converting character encodings
With mb_detect_encoding ($str); The extension= ES69en_mbstring.dll extension must be opened to use php
<?php
$str=" test ing";
$cha=mb_detect_encoding($str);
$s = iconv($cha,"UTF-8",$str);
var_dump($s);
?>
Result return:
string (0) ""
It's strange why this should be.
<?php
$str=" test ing";
$cha=mb_detect_encoding($str);
$s = iconv("GB2312","UTF-8",$str);
var_dump($s);
?>
The return result is correct. Find the function mb_detect_encoding($str); It's still not accurate. I don't know why.
string mb_convert_encoding (string $str, string $to_encoding [, mixed $from_encoding])
Can be converted to the specified encoding string, I wrote the example
<pre lang="php" line="1">
<?php
$a=" I'm fine ";
echo mb_convert_encoding ($a,'UTF-8');
?>
The result:
?? A brand? A brand?
Now the question is if I convert the different string encoding form 1 to utf-8, if I know the change in advance I can use iconv, but what if I don't know the encoding?
Problem 3: iconv problem, if the converted string, the first byte encoding is greater than 1 fixed number will return null.
Such as:
<?php
$str=chr(254)." test ing".chr(254);
$s = iconv("GB2312","UTF-8",$str);
var_dump($s);
?>
return
string (0) ""
mb_convert_encoding See the official usage of mb_ES118en:
http://cn.php.net/manual/en/function.mb-convert-encoding.php
Another function in PHP, iconv, is also used to convert string encodings, similar to the above function.
Here are some more detailed examples:
iconv - Convert string requested character encoding
(PHP 4
>
= 4.0.5, PHP 5)
mb_convert_encoding -- Convert character encoding
(PHP 4
>
= 4.0.6, PHP 5)
Usage:
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
enable mbstring extension library, php.ini. extension=php_mbstring. To get rid of
mb_convert_encoding can specify a variety of input codes, which will be automatically recognized based on the content, but the execution efficiency is much lower than iconv;
string iconv ( string in_charset, string out_charset, string str )
Note: For the second parameter, in addition to specifying the code to be converted to, you can add two suffixes: //TRANSLIT and //IGNORE, where //TRANSLIT will automatically convert characters that cannot be converted directly into one or more approximate characters, //IGNORE will ignore characters that cannot be converted, and the default effect is truncated from the first illegal character.
Returns the converted string or FALSE on failure.
Use:
An iconv error was found converting the character "-" to gb2312. Without the ignore argument, all strings following the character cannot be saved. Either way, the "--" won't translate and won't output. And mb_convert_encoding doesn't have this bug.
1 iconv is generally used, but only when the original encoding cannot be determined, or when iconv cannot be properly displayed after conversion, mb_convert_encoding is used.
from_encoding is specified code name before conversion. it be array or string the following text specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str, "UCS-2ES233en", "JIS, ES235en-ES236en, ES237en-ES238en");
/* "auto" is to "ASCII,JIS, ES246en-8, ES247en-ES248en,SJIS" */
$str = mb_convert_encoding($str, "ES256en-ES257en", "auto");
Example:
<?php
$content = iconv("GBK", "UTF-8", $content);
$content = mb_convert_encoding($content, "UTF-8", "GBK");
?>
This can be converted based on the character encoding of the input and output
<?php
function phpcharset($data, $to) {
if(is_array($data)) {
foreach($data as $key => $val) {
$data[$key] = phpcharset($val, $to);
}
} else {
$encode_array = array('ASCII', 'UTF-8', 'GBK', 'GB2312', 'BIG5');
$encoded = mb_detect_encoding($data, $encode_array);
$to = strtoupper($to);
if($encoded != $to) {
$data = mb_convert_encoding($data, $to, $encoded);
}
}
return $data;
}
?>