PHP correctly parses UTF 8 string technique applied

  • 2020-05-26 08:08:19
  • OfStack

In learning PHP & The conversion relationship between Unicode and UTF-8 is introduced in MYSQL -- character encoding section (1), and an UTF-8 encoding rule is summarized. According to this encoding rule, an UTF-8 encoding parsing program is written. The following is the implementation of PHP:

<?php 
/* 
 Program function, $str It's a mixture of Chinese and English UTF-8 Encoding string,  
 Base this string on UTF-8 The encoding rules are correctly decoded and displayed.  
*/ 


$str = ' Today is very Happy All the decisions to go KFC Eat coke wings !!!'; 

/* 
$str  Is the string to intercept  
$len  Is the number of characters intercepted  
*/ 
function utf8sub($str,$len) { 
if($len <= 0){ 
return ''; 
} 

$offset = 0; //  The offset when intercepting a high-order byte  
$chars = 0; //  The number of characters intercepted  
$res = ''; //  Holds the truncated result string  

while($chars < $len){ 
//  First take the first value of the string 1 bytes  
//  Put it into 10 Into the system  
//  And then to 2 Into the system  
$high = ord(substr($str,$offset,1)); 

// echo '$high='. $high .'<br />'; 

if($high == null ){ //  If I take out the high position of null To prove that you've got to the end, directly break 
break; 
} 
if(($high>>2) === 0x3F){ //  Move the high position to the right 2 , and 2 Into the system 111111 Compare and take the same 6 bytes  
//  The interception 2 bytes  
$count = 6; 
}else if(($high>>3) === 0x1F){ //  Move the high position to the right 2 , and 2 Into the system 11111 Compare and take the same 5 bytes  
//  The interception 3 bytes  
$count = 5; 
}else if(($high>>4) === 0xF){ //  Move the high position to the right 2 , and 2 Into the system 1111 Compare and take the same 4 bytes  

//  The interception 4 bytes  
$count = 4; 
}else if(($high>>5) === 0x7){ //  Move the high position to the right 2 , and 2 Into the system 111 Compare and take the same 3 bytes  

//  The interception 5 bytes  
$count = 3; 
}else if(($high>>6) === 0x3){ //  Move the high position to the right 2 , and 2 Into the system 11 Compare and take the same 2 bytes  
//  The interception 6 bytes  
$count = 2; 
}else if(($high>>7) === 0x0){ //  Move the high position to the right 2 , and 2 Into the system 0 Compare and take the same 1 bytes  
$count = 1; 
} 
// echo '$count='.$count.'<br />'; 

$res .= substr($str,$offset,$count); //  Take out the 1 Characters with $res String concatenation  
$chars += 1; //  The number of characters intercepted +1 
$offset += $count; //  Intercept the high offset to move back $count byte  
} 
return $res; 
} 

echo utf8sub($str,100);

Related articles: