PHP encoding conversion function mb_convert_encoding and iconv

  • 2020-03-31 20:05:09
  • OfStack

However, English generally does not have coding problems, only Chinese data will have this problem. For example, when you write a program in Zend Studio or Editplus, you use GBK code. If the data needs to be entered into the database, and the database code is utf8, then you need to encode and convert the data.

For usage of mb_convert_encoding see official:
(link: http://cn.php.net/manual/zh/function.mb-convert-encoding.php)

Make a GBK To utf-8

<?php 
header("content-Type: text/html; charset=Utf-8"); 
echo mb_convert_encoding(" My friend boy ", "UTF-8", "GBK"); 
?> 

Another GB2312 To Big5

<?php 
header("content-Type: text/html; charset=big5"); 
echo mb_convert_encoding(" You're my friend ", "big5", "GB2312"); 
?> 
However, to use the above function you need to install it but you need to enable the mbstring extension library first.

Iconv, another function in PHP, is also used to convert string encodings, similar to the previous function.

Here are some more detailed examples:
Iconv - Convert string to requested character encoding
(PHP 4 > = 4.0.5, PHP 5)
Mb_convert_encoding -- Convert character encoding
(PHP 4 > = 4.0.6, PHP 5)

Usage:
String mb_convert_encoding (string STR, string to_encoding [, mixed from_encoding])
You need to enable the mbstring extension library in php.ini; Extension =php_mbstring.dll; To get rid of
Mb_convert_encoding can specify multiple input encodings, which are automatically recognized based on the content, but the execution efficiency is much worse than that of iconv.


String iconv (string in_charset, string out_charset, string STR)
Note: for the second parameter, in addition to specifying the encoding to be converted, two suffixes can be added: //TRANSLIT and //IGNORE, where //TRANSLIT will automatically change the characters that cannot be directly converted into one or more similar characters, //IGNORE will IGNORE the characters that cannot be converted, and the default effect is to truncate from the first illegal character.
Returns the converted string, or FALSE on failure.


Use:

It was found that iconv made an error when converting the character "--" to gb2312. Without the ignore parameter, all strings following the character cannot be saved. Either way, the "-" is not converted successfully and cannot be output. In addition, mb_convert_encoding does not have this bug.

Generally, iconv is used, and mb_convert_encoding function is used only when the original encoding cannot be determined or the iconv cannot be displayed properly after conversion.

From_encoding is specified by a character code name before conversion. It can be an array or string - comma separated enumerated list. If it is not specified, the internal encoding will be 2.

$STR = mb_convert_encoding($STR, "ucs-2le", "JIS, eucjp-win, sjis-win");

$STR = mb_convert_encoding($STR, "euc-jp", "auto");

Example:

$content = iconv("GBK", "UTF-8", $content); 
$content = mb_convert_encoding($content, "UTF-8","GBK"); 

A little trap for using mb_convert_encoding transcoding in PHP
The use of mb_convert_encoding() method for character encoding conversion in PHP programs is familiar to all of you and is widely used. And in general, the method is good enough to merit praise. But in a project where we needed to use it to convert from UTF8 to GBK, we found a modest problem converting some special characters. Specifically, MB converts characters encoded in utf8 but not encoded in GBK to \0x00\0x80, which causes problems with converted GBK characters.
In our consciousness, in the process of character encoding conversion, if the target encoding is not expressible characters, transcoding program should do is to discard such characters, so that although some data is lost, but will not cause the transcoding of the character sequence is not available. It is not clear why MB should use the above method rather than discard it.
The temporary solution is to filter the transcoding sequence of strings, filtering out all characters of \x00\80. Or filter utf8 strings before escaping, filtering out all characters that ut8 can represent but GBK cannot. In terms of implementation difficulty, the first filtering method is relatively easy to do.

Related articles: