PHP regular judgment of Chinese UTF 8 or GBK ideas and concrete implementation

  • 2020-11-25 07:10:20
  • OfStack

UTF-8 matching: In javascript, it is easy to determine if a string is Chinese. Such as:
 
var str = "php programming "; 
if (/^[\u4e00-\u9fa5]+$/.test(str)) { 
alert(" The string is all Chinese "); 
}else{ 
alert(" The string is not all Chinese "); 
} 
//php , is used \x said 106 Of base data. Then, the transformation is as follows:  
$str = "php programming "; 
if (preg_match("/^[\x4e00-\x9fa5]+$/",$str)) { 
print(" The string is all Chinese "); 
} else { 
print(" The string is not all Chinese "); 
} 

It seems that the error was not reported and the result is correct, but the result of replacing $str with the word "programming" still shows that "this string is not all Chinese", which seems not accurate enough. Important: Check < Master regular expressions > [\x4e00-\x9fa5] is the concept of characters and character groups. \x{hex} expresses a hexadecimal number. It should be noted that hex can be 1-2 or 4-bit, but if it is 4-bit, you must add curly braces. Also, if hex is larger than x{FF}, it must be used with the u modifier, otherwise an error will occur illegally
Only regular characters matching full Angle characters can be found on the web: ^[\x80-\xff]*^/, where no parenthesis can be added
[\u4e00-\u9fa5] matches Chinese, but is not supported by PHP
However, since \x represents base 106 data, why is it different from the range \x4e00-\x9fa5 provided in js? So I switched to the following code and found that it was accurate:
 
$str = "php programming "; 
if (preg_match("/^[\x{4e00}-\x{9fa5}]+$/u",$str)) { 
print(" The string is all Chinese "); 
} else { 
print(" The string is not all Chinese "); 
} 

Know the final correct expression of matching Chinese characters with regular expressions under ES44en-8 encoding in php -- /^[\x{4e00}-\x{9fa5}]+$/u,

Refer to the above article to write the following 1 section of test code (copy the following code and save it as a.php file)
 
<?php 
$action = trim($_GET['action']); 
if($action == "sub") 
{ 
$str = $_POST['dir']; 
//if(!preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str)) //GB2312 Chinese character alphanumeric underline regular expression  
if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) //UTF-8 Chinese character alphanumeric underline regular expression  
{ 
echo "<font color=red> You enter the [".$str."] Containing illegal characters </font>"; 
} 
else 
{ 
echo "<font color=green> You enter the [".$str."] Perfectly legal , through !</font>"; 
} 
} 
?> 

 
<form. method="POST" action="?action=sub"> 
 The input character ( digital , The letter , Chinese characters , The underline ): 
<input type="text" name="dir" value=""> 
<input type="submit" value=" submit "> 
</form> 

GBK: preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str); GB2312 Regular expressions with alphanumeric characters underlined.

Related articles: