php iconv of : Detected an illegal character in input string

  • 2020-03-31 21:25:02
  • OfStack

This is how it started
$STR = iconv (' utf-8 ', 'GB2312, unescape (isset ($_GET [' STR'])? $_GET [' STR '] : "));
Error: iconv() : Detected an illegal character in input string

Considering that GB2312 character set is relatively small, let's change it to a larger one and change it to GBK:
$STR = iconv (' utf-8 ', 'GBK, unescape (isset ($_GET [' STR'])? $_GET [' STR '] : "));
After online or reported the same mistake!

After reading the manual carefully, I found this paragraph:
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, It can be approximated through one or several similarly & characters. If you append the string / / IGNORE. The characters that always be represented in the target charset are silently discarded. Otherwise, the STR is cut from the first illegal character.
So it's changed to:
$STR = iconv (' utf-8 ', 'GBK / / IGNORE, unescape (isset ($_GET [' STR'])? $_GET [' STR '] : "));
Local test //IGNORE can IGNORE the word it does not recognize and then go down without error, while //TRANSLIT is to truncate the word it does not recognize and the following content, and error. //IGNORE is what I need.
Now wait online to see the results (this is not a good practice, continue to ponder over the manual, search the Internet), hehe...

Find the following article online and find that mb_convert_encoding can also be used, but is less efficient than iconv.


Conversion string encoding iconv versus mb_convert_encoding

Iconv - Convert string to requested character encoding(PHP 4 > = 4.0.5, PHP 5)
Mb_convert_encoding -- Convert character encoding(PHP 4 > = 4.0.6, PHP 5)

Usage:
String mb_convert_encoding (string STR, string to_encoding [, mixed from_encoding])
You'll need to enable the mbstring extension library first, which in php.ini will; Extension =php_mbstring.dll; To get rid of

String iconv (string in_charset, string out_charset, string STR)
Note:
For the second parameter, in addition to specifying the encoding to be converted to, two suffixes can be added: //TRANSLIT and //IGNORE,
Among them:
//TRANSLIT automatically converts characters that cannot be directly converted into one or more similar characters,
//IGNORE ignores characters that cannot be converted, and the default is to truncate from the first invalid character.
Returns the converted string, or FALSE on failure.

Use:
1. It is found that iconv will make an error when converting the character "-" to gb2312. If there is no ignore parameter, all strings following this character cannot be saved. In any case, this "-" will not be converted successfully and will not be output. In addition, mb_convert_encoding does not have this bug.
2. Mb_convert_encoding can specify multiple input encodings, which are automatically recognized based on the content, but the execution efficiency is much lower than that of iconv. $STR = mb_convert_encoding($STR,"euc-jp","ASCII,JIS, euc-jp,SJIS, utf-8 "); The order of "ASCII,JIS, euc-jp,SJIS, utf-8" also varies
3. Generally, iconv is used. Mb_convert_encoding function is used only when the original encoding cannot be determined or the iconv cannot be displayed normally after conversion

From_encoding is specified by a character code name before conversion. It can be an array or string - comma separated enumerated list. If it is not specified, the internal encoding will be 2.

$STR = mb_convert_encoding($STR, "ucs-2le ", "JIS, eucjp-win, sjis-win");
$STR = mb_convert_encoding($STR, "euc-jp ', "auto");

Example:
$content = iconv("GBK", "utf-8 ", $content);
$content = mb_convert_encoding($content, "utf-8 ", "GBK");

Related articles: