The Java implementation detects if the string contains Chinese

  • 2020-04-01 04:19:45
  • OfStack

The code is very useful, so here's the crap

The main function is to determine whether the string contains Chinese characters and replace them with ASCLL


private static String regEx = "[\u4e00-\u9fa5]";

  
  private static String isChinese_Replace( String str_para )
  {
    Pattern p      = Pattern.compile( regEx );
    String str_result   = str_para;
    String str_0      = "";
    String str_1      = "";
    String str_data[]   = null;
    String str_return_reslut  = "";
    if ( str_result != null && str_result.trim().length() > 0 )
    {
      try {
        str_data = str_result.split( "" );
        for ( int i = 0; i < str_data.length; i++ )
        {
          Matcher m = p.matcher( str_data[i] );
          
          int count = 0;
          if ( m.find() )
          {
            count++;
            str_result   = m.group( 0 );
            byte[] b    = str_result.getBytes( "GBK" );
            str_0      = Integer.toHexString( b[0] );
            str_1      = Integer.toHexString( b[1] );
            str_return_reslut  = str_return_reslut + "/" + conver10( str_0 ) + conver10( str_1 ) + "/";
          } else {
            str_return_reslut = str_return_reslut + str_data[i];
          }
        }
      } catch ( NumberFormatException e ) {
        e.printStackTrace();
      } catch ( UnsupportedEncodingException e ) {
        e.printStackTrace();
      }
    } else {
      return(str_return_reslut);
    }
    return(str_return_reslut);
  }
  
  public static int conver10( String str_0 )
  {
    return(Integer.parseInt( str_0.substring( str_0.length() - 2, str_0.length() ), 16 ) );
  }

Let's look at a slightly simpler piece of code that you can use in less demanding places


import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class demo {
static String regEx = "[u4e00-u9fa5]";
static Pattern pat = Pattern.compile(regEx);
public static void main(String[] args) {
String input = "Hell world!";
System.out.println(isContainsChinese(input));
input = "hello world";
System.out.println(isContainsChinese(input));
}
  
public static boolean isContainsChinese(String str)
{
Matcher matcher = pat.matcher(str);
boolean flg = false;
if (matcher.find())  {
flg = true;
}
return flg;
}

Finally, we attach the range of unicode encoding for various characters:
        * Chinese characters: [0x4e00,0x9fa5] (or decimal [1998,40869])
        * digits: [0x30,0x39] (or decimal [48, 57])
        * lowercase letters: [0x61,0x7a] (or decimal [97, 122])
        * capital letters: [0x41,0x5a] (or decimal [65, 90])


Related articles: