PHP regular judgment of Chinese UTF 8 or GBK ideas and concrete implementation
- 2020-11-25 07:10:20
- OfStack
UTF-8 matching: In javascript, it is easy to determine if a string is Chinese. Such as:
It seems that the error was not reported and the result is correct, but the result of replacing $str with the word "programming" still shows that "this string is not all Chinese", which seems not accurate enough. Important: Check < Master regular expressions > [\x4e00-\x9fa5] is the concept of characters and character groups. \x{hex} expresses a hexadecimal number. It should be noted that hex can be 1-2 or 4-bit, but if it is 4-bit, you must add curly braces. Also, if hex is larger than x{FF}, it must be used with the u modifier, otherwise an error will occur illegally
Only regular characters matching full Angle characters can be found on the web: ^[\x80-\xff]*^/, where no parenthesis can be added
[\u4e00-\u9fa5] matches Chinese, but is not supported by PHP
However, since \x represents base 106 data, why is it different from the range \x4e00-\x9fa5 provided in js? So I switched to the following code and found that it was accurate:
Know the final correct expression of matching Chinese characters with regular expressions under ES44en-8 encoding in php -- /^[\x{4e00}-\x{9fa5}]+$/u,
Refer to the above article to write the following 1 section of test code (copy the following code and save it as a.php file)
GBK: preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str); GB2312 Regular expressions with alphanumeric characters underlined.
var str = "php programming ";
if (/^[\u4e00-\u9fa5]+$/.test(str)) {
alert(" The string is all Chinese ");
}else{
alert(" The string is not all Chinese ");
}
//php , is used \x said 106 Of base data. Then, the transformation is as follows:
$str = "php programming ";
if (preg_match("/^[\x4e00-\x9fa5]+$/",$str)) {
print(" The string is all Chinese ");
} else {
print(" The string is not all Chinese ");
}
It seems that the error was not reported and the result is correct, but the result of replacing $str with the word "programming" still shows that "this string is not all Chinese", which seems not accurate enough. Important: Check < Master regular expressions > [\x4e00-\x9fa5] is the concept of characters and character groups. \x{hex} expresses a hexadecimal number. It should be noted that hex can be 1-2 or 4-bit, but if it is 4-bit, you must add curly braces. Also, if hex is larger than x{FF}, it must be used with the u modifier, otherwise an error will occur illegally
Only regular characters matching full Angle characters can be found on the web: ^[\x80-\xff]*^/, where no parenthesis can be added
[\u4e00-\u9fa5] matches Chinese, but is not supported by PHP
However, since \x represents base 106 data, why is it different from the range \x4e00-\x9fa5 provided in js? So I switched to the following code and found that it was accurate:
$str = "php programming ";
if (preg_match("/^[\x{4e00}-\x{9fa5}]+$/u",$str)) {
print(" The string is all Chinese ");
} else {
print(" The string is not all Chinese ");
}
Know the final correct expression of matching Chinese characters with regular expressions under ES44en-8 encoding in php -- /^[\x{4e00}-\x{9fa5}]+$/u,
Refer to the above article to write the following 1 section of test code (copy the following code and save it as a.php file)
<?php
$action = trim($_GET['action']);
if($action == "sub")
{
$str = $_POST['dir'];
//if(!preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str)) //GB2312 Chinese character alphanumeric underline regular expression
if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) //UTF-8 Chinese character alphanumeric underline regular expression
{
echo "<font color=red> You enter the [".$str."] Containing illegal characters </font>";
}
else
{
echo "<font color=green> You enter the [".$str."] Perfectly legal , through !</font>";
}
}
?>
<form. method="POST" action="?action=sub">
The input character ( digital , The letter , Chinese characters , The underline ):
<input type="text" name="dir" value="">
<input type="submit" value=" submit ">
</form>
GBK: preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str); GB2312 Regular expressions with alphanumeric characters underlined.