The PHP encoding conversion function automatically converts character sets to support array conversion

  • 2020-05-27 04:34:40
  • OfStack

 
//  Automatic conversion character set   Support array conversion  
function auto_charset($fContents, $from='gbk', $to='utf-8') { 
$from = strtoupper($from) == 'UTF8' ? 'utf-8' : $from; 
$to = strtoupper($to) == 'UTF8' ? 'utf-8' : $to; 
if (strtoupper($from) === strtoupper($to) || empty($fContents) || (is_scalar($fContents) && !is_string($fContents))) { 
// If the encoding is the same or non-string scalars are not converted  
return $fContents; 
} 
if (is_string($fContents)) { 
if (function_exists('mb_convert_encoding')) { 
return mb_convert_encoding($fContents, $to, $from); 
} elseif (function_exists('iconv')) { 
return iconv($from, $to, $fContents); 
} else { 
return $fContents; 
} 
} elseif (is_array($fContents)) { 
foreach ($fContents as $key => $val) { 
$_key = auto_charset($key, $from, $to); 
$fContents[$_key] = auto_charset($val, $from, $to); 
if ($key != $_key) 
unset($fContents[$key]); 
} 
return $fContents; 
} 
else { 
return $fContents; 
} 
} 

When we are accepting data submitted by unknown clients, because the encoding of each client is not consistent with 1, but in our server end, we can only process it in 1 encoding way. In this case, we will be involved in the problem of converting the received characters into a specific encoding.
At this point you might want to use iconv directly for transcoding, but we know that the two parameters that iconv needs to provide are the input encoding and the output encoding, and we don't know what the received string is at this point. It would be nice if we could get the received character's encoding at this point.
There are usually two solutions to such a problem.

Plan 1
When you want the client to submit data, specify the code to be submitted, and you need to give one more variable to specify the code.
$string = $_GET['charset'] === 'gbk' ? iconv('gbk','utf-8',$_GET['str']) : $_GET['str'];
In this case, it doesn't seem like a good solution to use if there is no engagement or if we have no control over the client.

Scheme 2
The received data encoding is detected directly on the server side.
This scheme is of course the most ideal, now the question is how to detect the encoding of 1 character? In this case, in php, the mb_check_encoding extension,mb_string, provides the functionality we need.
$str = mb_check_encoding($_GET['str'],'gbk') ? iconv('gbk','utf-8',$_GET['str']) : $_GET['str'];
But this requires turning on the mb_string extension, which is sometimes not turned on in our production server. In this case, you need to use the following function to determine the encoding.
I did not write the following function
 
function isGb2312($string) { 
for($i=0; $i 127) { 
if( ($v >= 228) && ($v < = 233) ) 
{ 
if( ($i+2) >= (strlen($string) - 1)) return true; 
$v1 = ord( $string[$i+1] ); 
$v2 = ord( $string[$i+2] ); 
if( ($v1 >= 128) && ($v1 < =191) && ($v2 >=128) && ($v2 < = 191) ) 
return false; 
else 
return true; 
} 
} 
} 
return true; 
} 
function isUtf8($string) { 
return preg_match('%^(?: 
[\x09\x0A\x0D\x20-\x7E] # ASCII 
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte 
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs 
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte 
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates 
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 
)*$%xs', $string); 
} 

Here we can make any of the above functions to implement the code detection. And converts it to the specified encoding.
$str = isGb2312($_GET['str'],'gbk') ? iconv('gbk','utf-8',$_GET['str']) : $_GET['str'];


Related articles: