The C language determines whether an char* is encoded with utf8
- 2020-05-24 05:55:09
- OfStack
The C language determines whether an char* is an utf8 encoding
I changed it 1 time in ASCII, and the pure ASCII encoding string also returns true, because UTF8 and ASCII are compatible
Example code:
int utf8_check(const char* str, size_t length) {
size_t i;
int nBytes;
unsigned char chr;
i = 0;
nBytes = 0;
while (i < length) {
chr = *(str + i);
if (nBytes == 0) { // Count bytes
if ((chr & 0x80) != 0) {
while ((chr & 0x80) != 0) {
chr <<= 1;
nBytes++;
}
if ((nBytes < 2) || (nBytes > 6)) {
return 0; // The first 1 At least 1 byte 110x xxxx
}
nBytes--; // Minus what you're taking 1 bytes
}
} else { // Multiple bytes in addition to the first 1 The remaining byte of a byte
if ((chr & 0xC0) != 0x80) {
return 0; // The rest of the bytes 10xx xxxx In the form of
}
nBytes--;
}
i++;
}
return (nBytes == 0);
}
Thank you for reading, I hope to help you, thank you for your support of this site!