PHP UTF8 Chinese character truncation function code
- 2020-05-24 05:16:38
- OfStack
php truncation function (utf8)
//utf8 Chinese character truncation in the format
//$sourcestr Is the string to process
//$cutlength Is the intercept length ( The number of words )
//$addstr A character added at the end of a length
function cut_str($sourcestr, $cutlength, $addstr='...'){
$returnstr='';
$i=0;
$n=0;
$str_length=strlen($sourcestr);// The number of bytes in a string
while (($n<$cutlength) and ($i<=$str_length)){
$temp_str=substr($sourcestr,$i,1);
$ascnum=Ord($temp_str);// You get the number one in the string $i A character ascii code
if ($ascnum>=224){ // if ASCII With a high 224 .
$returnstr=$returnstr.substr($sourcestr,$i,3); // According to the UTF-8 Code specification, will 3 Three consecutive characters count as a single character
$i=$i+3; // The actual Byte Plan for 3
$n++; // String length meter 1
}elseif ($ascnum>=192){ // if ASCII With a high 192 .
$returnstr=$returnstr.substr($sourcestr,$i,2); // According to the UTF-8 Code specification, will 2 Three consecutive characters count as a single character
$i=$i+2; // The actual Byte Plan for 2
$n++; // String length meter 1
}elseif ($ascnum>=65 && $ascnum<=90){ // If it's a capital letter,
$returnstr=$returnstr.substr($sourcestr,$i,1);
$i=$i+1; // The actual Byte The number still plan 1 a
$n++; // But considering the overall beauty, the capital letter is calculated 1 High order character
}else{ // In other cases, including lowercase letters and half-angle punctuation,
$returnstr=$returnstr.substr($sourcestr,$i,1);
$i=$i+1; // The actual Byte meter 1 a
$n=$n+0.5; // Lowercase letters and half-corner punctuation, etc., with a half-high character width ...
}
}
if ($str_length>$cutlength){
$returnstr = $returnstr . $addstr;// A character added at the end of a length
}
return $returnstr;
}
<?php
/* UTF-8 Chinese character truncation program */
$str = "123 This is the test string ";
$str1 = "() (a) ";
echo subUTF8str($str,0,3)."<br>";
echo subUTF8str($str,0,4)."<br>";
echo subUTF8str($str1,0,4)."<br>";
echo subUTF8str($str1,0,10)."<br>";
function subUTF8str($str,$start=0,$length=80){
$cur_len = 0; // Human understanding of the length of a string
$all_len = strlen($str); // The machine understands string length
if($length > $all_len)
{
return $str;
}
for($i = 0;$i < $all_len;)
{
if($cur_len == $start)
{
break;
}
if (ord($str[$i]) > 127)
{
$i += 3;
}else{
$i += 1;
}
$cur_len ++;
}
$start_pos = $i;
$temp_pos = $cur_len;
for(;$cur_len - $temp_pos < $length;)
{
if($i >= $all_len)
break;
if (ord($str[$i]) > 127)
{
$i += 3;
}else{
$i += 1;
}
$cur_len ++;
}
$end_pos = $i;
return substr($str,$start_pos,$end_pos);
}
?>
In fact, PHP is native to the charset character capture scheme, so that's what it looks like... �..
Multibyte String Functions family of functions,
string mb_substr (string $str, int $start [, int $length [, string $encoding]]) is used for string interception
int mb_strlen (string $str [, string $encoding]) returns the length of the string
....
Please refer to the PHP manual for details