c++ intercept Chinese and English mixed string code example
- 2020-06-15 10:04:52
- OfStack
In C++, you can use string.substr (), but only in English.
If it is A Chinese character, you may have to count the number of characters. If it is a Chinese character mixed with English, you will be left with nothing.
But I happen to need such a function, so I have to implement 1, where how to judge Chinese characters and English
See here.
The code is simple and not optimized, if there is a better way to welcome the following code.
#include <iostream>
#include <string>
#include<cstdio>
#include<vector>
#include<typeinfo>
using namespace std;
int is_zh_ch(char p)
{
/* The highest value of two bytes in a Chinese character is 1, Here we use the method of determining the highest bit
will p The byte is shifted, to the right 8 Bits. So, if it's shifted, it's going to be 0 .
Indicates that the highest bit of the original byte is 0 , not 1 Then it's not Chinese 1 bytes
*/
if(~(p >> 8) == 0)
{
return 1;// It's not a Chinese character
}
return -1;
}
string sub(string str,int start,int end=-1)
{
if(typeid(str)==typeid(string) && str.length()>0)
{
int len=str.length();
string tmp="";
// The first str Chinese characters in Chinese are separated from English characters
vector <string> dump;
int i=0;
while(i<len)
{
if (is_zh_ch(str.at(i))==1)
{
dump.push_back(str.substr(i,2));
i=i+2;
}
else
{
dump.push_back(str.substr(i,1));
i=i+1;
}
}
end=end>0?end:dump.size(); //end The default is dump.size
if(start<0||start>end)
printf("start is wrong");
// Directly from the dump To get to
for(i=start; i<=end; i++)
{
tmp+=dump[i-1];
}
return tmp;
}
else
{
printf("str is not string\n");
return "";
}
}
int main()
{
string p=" mid-levels wuji";
cout<<sub(p,1,1)<<endl;
cout<<sub(p,2,2)<<endl;
cout<<sub(p,3);
}