PHP UTF8 Chinese character truncation function code

  • 2020-05-24 05:16:38
  • OfStack

php truncation function (utf8)


//utf8 Chinese character truncation in the format 
//$sourcestr  Is the string to process 
//$cutlength  Is the intercept length ( The number of words )
//$addstr  A character added at the end of a length 
function cut_str($sourcestr, $cutlength, $addstr='...'){
 $returnstr='';
 $i=0;
 $n=0;
 $str_length=strlen($sourcestr);// The number of bytes in a string 
 while (($n<$cutlength) and ($i<=$str_length)){
  $temp_str=substr($sourcestr,$i,1);
  $ascnum=Ord($temp_str);// You get the number one in the string $i A character ascii code 
  if ($ascnum>=224){ // if ASCII With a high 224 . 
   $returnstr=$returnstr.substr($sourcestr,$i,3); // According to the UTF-8 Code specification, will 3 Three consecutive characters count as a single character 
   $i=$i+3; // The actual Byte Plan for 3
   $n++; // String length meter 1
  }elseif ($ascnum>=192){ // if ASCII With a high 192 . 
   $returnstr=$returnstr.substr($sourcestr,$i,2); // According to the UTF-8 Code specification, will 2 Three consecutive characters count as a single character 
   $i=$i+2; // The actual Byte Plan for 2
   $n++; // String length meter 1
  }elseif ($ascnum>=65 && $ascnum<=90){ // If it's a capital letter, 
   $returnstr=$returnstr.substr($sourcestr,$i,1);
   $i=$i+1; // The actual Byte The number still plan 1 a 
   $n++; // But considering the overall beauty, the capital letter is calculated 1 High order character 
  }else{ // In other cases, including lowercase letters and half-angle punctuation, 
   $returnstr=$returnstr.substr($sourcestr,$i,1);
   $i=$i+1; // The actual Byte meter 1 a 
   $n=$n+0.5; // Lowercase letters and half-corner punctuation, etc., with a half-high character width ...
  }
 }
 if ($str_length>$cutlength){
  $returnstr = $returnstr . $addstr;// A character added at the end of a length 
 }
 return $returnstr;
} 



 
<?php 
/* UTF-8 Chinese character truncation program  */ 
$str = "123 This is the test string "; 
$str1 = "() (a) "; 
echo subUTF8str($str,0,3)."<br>"; 
echo subUTF8str($str,0,4)."<br>"; 
echo subUTF8str($str1,0,4)."<br>"; 
echo subUTF8str($str1,0,10)."<br>"; 
function subUTF8str($str,$start=0,$length=80){ 
$cur_len = 0; // Human understanding of the length of a string  
$all_len = strlen($str); // The machine understands string length  
if($length > $all_len) 
{ 
return $str; 
} 
for($i = 0;$i < $all_len;) 
{ 
if($cur_len == $start) 
{ 
break; 
} 
if (ord($str[$i]) > 127) 
{ 
$i += 3; 
}else{ 
$i += 1; 
} 
$cur_len ++; 
} 
$start_pos = $i; 
$temp_pos = $cur_len; 
for(;$cur_len - $temp_pos < $length;) 
{ 
if($i >= $all_len) 
break; 
if (ord($str[$i]) > 127) 
{ 
$i += 3; 
}else{ 
$i += 1; 
} 
$cur_len ++; 
} 
$end_pos = $i; 
return substr($str,$start_pos,$end_pos); 
} 
?> 

In fact, PHP is native to the charset character capture scheme, so that's what it looks like... �..
Multibyte String Functions family of functions,

string mb_substr (string $str, int $start [, int $length [, string $encoding]]) is used for string interception
int mb_strlen (string $str [, string $encoding]) returns the length of the string
....
Please refer to the PHP manual for details

Related articles: