In depth analysis of mb_convert_encoding and iconv functions in PHP

  • 2020-06-15 07:58:39
  • OfStack

The mb_convert_encoding function is used to convert the encoding. I didn't understand the concept of program coding, but now I seem to be getting the hang of it.

However, English 1 generally does not have coding problems, only Chinese data will have this problem. For example, when you use Zend Studio or Editplus to write programs, you use gbk encoding. If the data needs to be entered into the database, and the database is encoded as utf8, then the data should be encoded and converted, otherwise it will become a messy code when it is entered into the database.
The use of mb_convert_encoding is official:
http://php.net/manual/zh/function.mb-convert-encoding.php
Do 1 GBK To ES20en-8


< ?php
header("content-Type: text/html; charset=Utf-8");
echo mb_convert_encoding(" Dear friend ", "UTF-8", "GBK");
?>

GB2312 To Big5

< ?php
header("content-Type: text/html; charset=big5");
echo mb_convert_encoding(" You're my friend ", "big5", "GB2312");
?>

However, using the above functions requires installation but requires the enable mbstring extension library first.
iconv, another function in PHP, is also used to convert string encoding, similar to the above function.
Here are some more detailed examples:

iconv  -  Convert string to requested character encoding
(PHP 4 >= 4.0.5, PHP 5)
mb_convert_encoding  -  Convert character encoding
(PHP 4 >= 4.0.6, PHP 5)

Usage:
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
Need enable mbstring extension library, in ES44en.ini will; extension= php_ES48en.dll; To get rid of
mb_convert_encoding can specify a variety of input codes, which will be automatically recognized based on the content, but the execution efficiency is much lower than iconv.
string iconv ( string in_charset, string out_charset, string str )
Note: For the second parameter, in addition to specifying the code to be converted to, you can add two suffixes: //TRANSLIT and //IGNORE, where //TRANSLIT will automatically convert characters that cannot be converted directly to one or more similar characters, //IGNORE will ignore characters that cannot be converted, and the default effect is truncated from the first illegal character.
Returns the converted string or FALSE on failure.
Use:
It was found that iconv failed to convert the character "-" to gb2312. Without the ignore argument, all strings following the character cannot be saved. Either way, the "--" won't translate and won't output. In addition mb_convert_encoding does not have this bug.

1 Generally, iconv is used. Only when the original encoding cannot be determined, or when iconv cannot be properly displayed after conversion, mb_convert_encoding function is used.


from_encoding is specified by character code name before conversion. it can be array or string - comma separated enumerated list. If it is not specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str,  " UCS-2LE " ,  " JIS, eucjp-win, sjis-win " );
/*  " auto "  is expanded to  " ASCII,JIS,UTF-8,EUC-JP,SJIS "  */
$str = mb_convert_encoding($str,  " EUC-JP " ,  " auto " );

Example:

$content = iconv( " GBK " ,  " UTF-8 " , $content);
$content = mb_convert_encoding($content,  " UTF-8 " ,  " GBK " );


Related articles: