How do I get the most frequent substring in a Chinese string
- 2020-08-22 21:56:02
- OfStack
The length of the substring can be set by itself (for example, 4 characters or 5 characters in a row).
$str =' I am Chinese, I am a foreigner, I am Korean, I am American, I am Chinese, I am British, I am Chinese, I am a foreigner ';
Count_string($str,5);
function Count_string($sstr,$length)
{
$cnt_tmp = 0;
$cnt = 0;
$str = '';
$str_tmp = array();
$str_arr = array();
mb_internal_encoding("gb2312");
$max_length = (mb_strlen($sstr)-$length);
// Get the set of substrings
for($i=0;$i<=$max_length;$i++)
{
$str_tmp[] = mb_substr($sstr, $i, $length);
}
// Remove the repeating substring
$str_tmp = array_unique($str_tmp);
// Count occurrences
foreach($str_tmp as $key=>$value)
{
$cnt_tmp = mb_substr_count($sstr,$value);
if($cnt_tmp>=$cnt)
{
$cnt = $cnt_tmp;
$str_arr[$value] = $cnt;
}
}
// Processing results in multiple outcomes
foreach($str_arr as $key=>$value)
{
if($value == $cnt)
{$str .=$key."<br>";}
}
echo ' The most common substring is :<br>'.$str.'<br> occurrences :'.$cnt;
}