c++ intercept Chinese and English mixed string code example

2020-06-15 10:04:52
OfStack

In C++, you can use string.substr (), but only in English.

If it is A Chinese character, you may have to count the number of characters. If it is a Chinese character mixed with English, you will be left with nothing.

But I happen to need such a function, so I have to implement 1, where how to judge Chinese characters and English

See here.

The code is simple and not optimized, if there is a better way to welcome the following code.


#include <iostream>
#include <string>
#include<cstdio>
#include<vector>
#include<typeinfo>
using namespace std;

int is_zh_ch(char p)
{

  /* The highest value of two bytes in a Chinese character is 1, Here we use the method of determining the highest bit 
   will p The byte is shifted, to the right 8 Bits. So, if it's shifted, it's going to be 0 . 
   Indicates that the highest bit of the original byte is 0 , not 1 Then it's not Chinese 1 bytes 
  */
  if(~(p >> 8) == 0)
  {
    return 1;// It's not a Chinese character 
  }

  return -1;
}



string sub(string str,int start,int end=-1)
{

  if(typeid(str)==typeid(string) && str.length()>0)
  {
    int len=str.length();

    string tmp="";

    // The first str Chinese characters in Chinese are separated from English characters 
    vector <string> dump;
    int i=0;
    while(i<len)
    {
      if (is_zh_ch(str.at(i))==1)
      {
        dump.push_back(str.substr(i,2));
        i=i+2;

      }
      else
      {
        dump.push_back(str.substr(i,1));
        i=i+1;
      }
    }


    end=end>0?end:dump.size(); //end The default is dump.size
    if(start<0||start>end)
      printf("start is wrong");
　　　　　// Directly from the dump To get to 
    for(i=start; i<=end; i++)
    {
      tmp+=dump[i-1];
    }

    return tmp;
  }
  else
  {
    printf("str is not string\n");
    return "";

  }
}

int main()
{
  string p=" mid-levels wuji";
  cout<<sub(p,1,1)<<endl;
  cout<<sub(p,2,2)<<endl;
  cout<<sub(p,3);
}