The Java implementation detects if the string contains Chinese
- 2020-04-01 04:19:45
- OfStack
The code is very useful, so here's the crap
The main function is to determine whether the string contains Chinese characters and replace them with ASCLL
private static String regEx = "[\u4e00-\u9fa5]";
private static String isChinese_Replace( String str_para )
{
Pattern p = Pattern.compile( regEx );
String str_result = str_para;
String str_0 = "";
String str_1 = "";
String str_data[] = null;
String str_return_reslut = "";
if ( str_result != null && str_result.trim().length() > 0 )
{
try {
str_data = str_result.split( "" );
for ( int i = 0; i < str_data.length; i++ )
{
Matcher m = p.matcher( str_data[i] );
int count = 0;
if ( m.find() )
{
count++;
str_result = m.group( 0 );
byte[] b = str_result.getBytes( "GBK" );
str_0 = Integer.toHexString( b[0] );
str_1 = Integer.toHexString( b[1] );
str_return_reslut = str_return_reslut + "/" + conver10( str_0 ) + conver10( str_1 ) + "/";
} else {
str_return_reslut = str_return_reslut + str_data[i];
}
}
} catch ( NumberFormatException e ) {
e.printStackTrace();
} catch ( UnsupportedEncodingException e ) {
e.printStackTrace();
}
} else {
return(str_return_reslut);
}
return(str_return_reslut);
}
public static int conver10( String str_0 )
{
return(Integer.parseInt( str_0.substring( str_0.length() - 2, str_0.length() ), 16 ) );
}
Let's look at a slightly simpler piece of code that you can use in less demanding places
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class demo {
static String regEx = "[u4e00-u9fa5]";
static Pattern pat = Pattern.compile(regEx);
public static void main(String[] args) {
String input = "Hell world!";
System.out.println(isContainsChinese(input));
input = "hello world";
System.out.println(isContainsChinese(input));
}
public static boolean isContainsChinese(String str)
{
Matcher matcher = pat.matcher(str);
boolean flg = false;
if (matcher.find()) {
flg = true;
}
return flg;
}
Finally, we attach the range of unicode encoding for various characters:
* Chinese characters: [0x4e00,0x9fa5] (or decimal [1998,40869])
* digits: [0x30,0x39] (or decimal [48, 57])
* lowercase letters: [0x61,0x7a] (or decimal [97, 122])
* capital letters: [0x41,0x5a] (or decimal [65, 90])