PHP about ord of $str gt; Detailed instructions for 0x80

  • 2020-05-24 05:14:51
  • OfStack

The encoding of the GBK simplified character set is expressed in both 1 and 2 bytes. When the high digit is 0x00 ~ 0x7f, it is 1 byte; when the high digit is 0x80 or above, it is represented by 2 bytes."

Note: all brackets are in base 2

When you find that 1 byte is greater than 0x7f, then it must be a Chinese character. How do you know that it must be greater than 0x7f?
The number following 0x7f(1111111) is 0x80(10000000), so to be greater than 0x7f, the highest bit of this byte must be 1, we just need to determine whether the highest bit is 1 or not.

Judgment method:

Bit and (the same bit is 1, otherwise it is 0):
For example, to judge whether the third bit of a number is 1, as long as it is followed by 4(100) bit and, the second bit of a number is 1 is followed by 2(10) bit and.
Similarly, to determine whether the 8th bit is 1, just follow (10 million), which is 0x and 80 bits.

Why not here > 0x7f,php might be fine, but in other strongly typed languages, the highest bit of a byte is used to indicate a negative number, and a negative number certainly cannot be greater than 0x7f(the largest integer)

Here's another example:
assic code for a is 97(1100001)
The assic code of A is 65(1000001).

The assic code of b is 98(1100010)
The assic code of b is 66(1000010)

Find a rule: a letter of a-z, as long as it is in lower case, must be 1 in the sixth bit. We can use this to determine the case:
In this case, just follow the letter with 0x20(100,000) and judge:
 
if(ord($a)&0x20){ 
// A capital  
} 

How do I capitalize all the letters? The 6 th bit 1 is 0:
 
$a='a'; 
$a = chr(ord($a)&(~0x20)); 
echo $a; 

Related articles: