PHP realizes the method of judging spam comments by Chinese character ratio

  • 2021-07-21 08:06:44
  • OfStack

In this paper, an example is given to describe the method of judging spam comments by the ratio of Chinese characters in PHP. Share it for your reference. The specific implementation method is as follows:

1. Requirements:

Recently, this kind of junk comments often appear: 1. A large English character is mixed with 1 or 2 uncommon Chinese characters, which contains Chinese characters and does not contain any sensitive Chinese words, so it has passed the comment filtering openly. The processing of such comments can be confirmed by judging the ratio of Chinese characters, but there will also be a certain misjudgment.

2. Solution:

To use two functions of php, strlen and mb_strlen, strlen will identify the length of a single Chinese character as 3, and mb_strlen will have a length of 1. The difference between the lengths of the same segment of characters obtained by two functions is twice the actual number of Chinese characters, which is divided by 2 to get the actual number of characters, and the ratio of Chinese characters to the total number of characters is obtained by comparing with the length obtained by mb_strlen.

3. Implementation code:

 $len_all = strlen($comment['text']);                      
 $len_st = mb_strlen($comment['text'], 'UTF-8');
 if(($len_all-$len_st)/(2*$len_st) < 0.5){
        $error = " Chinese characters are less than percent 510"; 
 }

If you post the code in the comment, it will cause the Chinese character ratio to be low, so you need to filter out the code field before judging.

I hope this article is helpful to everyone's PHP programming.


Related articles: