PHP correctly parses UTF 8 string technique applied
- 2020-05-26 08:08:19
- OfStack
In learning PHP
&
The conversion relationship between Unicode and UTF-8 is introduced in MYSQL -- character encoding section (1), and an UTF-8 encoding rule is summarized. According to this encoding rule, an UTF-8 encoding parsing program is written. The following is the implementation of PHP:
<?php
/*
Program function, $str It's a mixture of Chinese and English UTF-8 Encoding string,
Base this string on UTF-8 The encoding rules are correctly decoded and displayed.
*/
$str = ' Today is very Happy All the decisions to go KFC Eat coke wings !!!';
/*
$str Is the string to intercept
$len Is the number of characters intercepted
*/
function utf8sub($str,$len) {
if($len <= 0){
return '';
}
$offset = 0; // The offset when intercepting a high-order byte
$chars = 0; // The number of characters intercepted
$res = ''; // Holds the truncated result string
while($chars < $len){
// First take the first value of the string 1 bytes
// Put it into 10 Into the system
// And then to 2 Into the system
$high = ord(substr($str,$offset,1));
// echo '$high='. $high .'<br />';
if($high == null ){ // If I take out the high position of null To prove that you've got to the end, directly break
break;
}
if(($high>>2) === 0x3F){ // Move the high position to the right 2 , and 2 Into the system 111111 Compare and take the same 6 bytes
// The interception 2 bytes
$count = 6;
}else if(($high>>3) === 0x1F){ // Move the high position to the right 2 , and 2 Into the system 11111 Compare and take the same 5 bytes
// The interception 3 bytes
$count = 5;
}else if(($high>>4) === 0xF){ // Move the high position to the right 2 , and 2 Into the system 1111 Compare and take the same 4 bytes
// The interception 4 bytes
$count = 4;
}else if(($high>>5) === 0x7){ // Move the high position to the right 2 , and 2 Into the system 111 Compare and take the same 3 bytes
// The interception 5 bytes
$count = 3;
}else if(($high>>6) === 0x3){ // Move the high position to the right 2 , and 2 Into the system 11 Compare and take the same 2 bytes
// The interception 6 bytes
$count = 2;
}else if(($high>>7) === 0x0){ // Move the high position to the right 2 , and 2 Into the system 0 Compare and take the same 1 bytes
$count = 1;
}
// echo '$count='.$count.'<br />';
$res .= substr($str,$offset,$count); // Take out the 1 Characters with $res String concatenation
$chars += 1; // The number of characters intercepted +1
$offset += $count; // Intercept the high offset to move back $count byte
}
return $res;
}
echo utf8sub($str,100);