A basic way to use regular expressions in Java programming

  • 2020-04-01 04:25:50
  • OfStack

As we all know, in the process of program development, it is inevitable to encounter the need to match, find, replace, judge the string of situations occur, and these situations are sometimes more complex, if the pure coding approach, often waste programmer's time and effort. Therefore, learning and using regular expressions is the main way to solve this contradiction.
  As most of us know, regular expressions are a specification for pattern matching and substitution. A regular expression is a literal pattern consisting of ordinary characters (such as characters a through z) and special characters (metacharacters) that describes one or more strings to be matched when looking for a literal body. A regular expression ACTS as a template to match a character pattern to the string being searched for.
  Since the release of the java.util.regex package in jdk1.4, it has provided us with a good platform for Java regular expression applications.
 
  Since regular expressions are a very complex system, I'll just give you a few ideas for getting started. See the books and explore for more.

\ \ the backslash \ \ t interval (' u0009 ') \ n line feed (' \ u000A) \ r carriage return (' \ u000D ') \d digit is equivalent to [0-9] \D non-numeric is equivalent to [^0-9] \s blank symbol [\t\n\x0B\f\r] \S non-blank symbol [^\t\n\x0B\f\r] \w individual character [a-za-z_0-9] \W non-individual character [^ a-za-z_0-9] \ f form-feed character The Escape \ e \b the boundary of a word \B a non - word boundary \G end of previous match ^ is the beginning of the limit ^ java        The condition is limited to characters beginning with Java $is the restricted end Java $        The condition is limited to characters ending in Java .   Conditional constraint on any single character other than \n Java..         The condition is limited to any two characters other than the swap line after Java


Add a certain restriction "[]"

[a-z]         The condition is limited to one character in the lowercase a to z range [a-z]         The condition is limited to one character in the uppercase A to Z range The [a-za-z] condition is limited to one character in the lowercase a to Z or uppercase a to Z range [0-9]         The condition is limited to one character in the lowercase 0 to 9 range The [0-9a-z] condition is limited to one character in the lowercase 0 to 9 or a to z range [0-9[a-z]] conditions are limited to one character (intersection) in the lowercase 0 to 9 or a to z range Add ^ to [] and then add the constraint "[^]" again. [^ a-z]         The condition is limited to one character in the non-lowercase a to z range [^ a-z]         The condition is limited to one character in the non-uppercase A to Z range The [^ a-za-z] condition is limited to one character in the range of non-lowercase a to Z or uppercase a to Z [^ 0-9]         The condition is limited to one character in the non-lowercase 0 to 9 range The [^0-9a-z] condition is limited to one character in the non-lowercase 0 to 9 or a to z range [^0-9[a-z]] the condition is limited to one character (intersection) in the range of non-lowercase 0 to 9 or a to z

You can use "*" when the constraint is 0 or more times for a particular character

J *         More than 0 J's . *         More than 0 arbitrary characters J. * D        More than 0 arbitrary characters between J and D

You can use "+" when the constraint occurs more than 1 time for a particular character

J +         More than 1 J . +         More than 1 arbitrary character J. + D        More than 1 arbitrary character between J and D

You can use "?" when the constraint is that a particular character appears 0 or more times.

JA & # 63;         J or JA appears

Limit to the specified number of consecutive occurrences of the character "{a}"

J {2}         JJ J {3}         JJJ

More than a text, and "{a,}"

J {3}         JJJ, JJJJ, JJJJJ, & # 63; The & # 63; The & # 63; (more than 3 times J)

No less than "{a,b}"

J {3, 5}         JJJ or JJJJ or JJJJJ

Let's take one |.

J | A        J or A Java | Hello        Java or Hello

 

"()" specifies a combination type
For example, I query < A href = \ \ "index. The HTML" > Index< / a> In < A href> < / a> Can be written as < A. * href = \ ". * \ "> (. + the & # 63;) < / a>

When using the Pattern.compile function, you can add parameters that control the matching behavior of the regular expression:
Pattern Pattern.compile(String regex, int flag)

The value range of flag is as follows:
The Pattern. CANON_EQ         A match is only considered if and only if the "canonical decomposition" of both characters is identical. For example, after using this flag, the expression "a\u030A" will match "?" . By default, "canonical equivalence" is not considered.
The Pattern. CASE_INSENSITIVE (& # 63; I)         By default, case-insensitive matching applies only to the us-ascii character set. This flag allows the expression to match regardless of case. To match Unicode characters of unknown size, simply combine UNICODE_CASE with this flag.
Pattern.COM MENTS (& # 63; X)         In this mode, the space character is ignored when matching. The comment starts at # and ends at this line. Unix line mode can be enabled with embedded flags.
The Pattern. DOTALL (& # 63; S)         In this mode, the expression '.' can match any character, including a line terminator. By default, the expression '.' does not match the line terminator.
The Pattern. The MULTILINE
(& # 63; M)         In this pattern, '^' and '$' match the beginning and end of a row, respectively. In addition, '^' still matches the beginning of the string, and '$' matches the end of the string. By default, these two expressions match only the beginning and end of the string.
The Pattern. UNICODE_CASE
(& # 63; U)         In this mode, if you also enable the CASE_INSENSITIVE flag, it matches Unicode characters with unknown case or case sensitivity. By default, case-insensitive matching applies only to the us-ascii character set.
The Pattern. UNIX_LINES (& # 63; D)         In this mode, only '\n' is considered a line abort and is matched with '.', '^', and '$'.


Beyond generalities, here are a few simple Java regularization use cases:

In pieces for example, when a string contains validation


//Find strings that start with Java and end arbitrarily
 Pattern pattern = Pattern.compile("^Java.*");
 Matcher matcher = pattern.matcher("Java Not a person ");
 boolean b= matcher.matches();
 //True is returned when the condition is satisfied, or false otherwise
 System.out.println(b);


In pieces when dividing a string with multiple conditions


Pattern pattern = Pattern.compile("[, |]+");
String[] strs = pattern.split("Java Hello World Java,Hello,,World|Sun");
for (int i=0;i<strs.length;i++) {
  System.out.println(strs[i]);
} 

In pieces text substitution (first occurrence of characters)


Pattern pattern = Pattern.compile(" Regular expression ");
Matcher matcher = pattern.matcher(" Regular expression  Hello World, Regular expression  Hello World");
//Replace the first data that conforms to the regular
System.out.println(matcher.replaceFirst("Java"));

In pieces text substitution (all)


Pattern pattern = Pattern.compile(" Regular expression ");
Matcher matcher = pattern.matcher(" Regular expression  Hello World, Regular expression  Hello World");
//Replace the first data that conforms to the regular
System.out.println(matcher.replaceAll("Java"));


In pieces text replacement (replacement characters)


Pattern pattern = Pattern.compile(" Regular expression ");
Matcher matcher = pattern.matcher(" Regular expression  Hello World, Regular expression  Hello World ");
StringBuffer sbr = new StringBuffer();
while (matcher.find()) {
  matcher.appendReplacement(sbr, "Java");
}
matcher.appendTail(sbr);
System.out.println(sbr.toString());

In pieces verify whether it is an email address


String str="ceponline@yahoo.com.cn";
Pattern pattern = Pattern.compile("[\w\.\-]+@([\w\-]+\.)+[\w\-]+",Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.matches());

In pieces remove HTML tags


Pattern pattern = Pattern.compile("<.+?>", Pattern.DOTALL);
Matcher matcher = pattern.matcher("<a href="index.html"> The home page </a>");
String string = matcher.replaceAll("");
System.out.println(string);

In pieces find the corresponding conditional string in HTML


Pattern pattern = Pattern.compile("href="(.+?)"");
Matcher matcher = pattern.matcher("<a href="index.html"> The home page </a>");
if(matcher.find())
 System.out.println(matcher.group(1));
}

In pieces intercept the http:// address


//Capture the url
Pattern pattern = Pattern.compile("(http://|https://){1}[\w\.\-/:]+");
Matcher matcher = pattern.matcher("dsdsds<http://dsds//gfgffdfd>fdf");
StringBuffer buffer = new StringBuffer();
while(matcher.find()){       
  buffer.append(matcher.group());    
  buffer.append("rn");       
System.out.println(buffer.toString());
}

             
In pieces replaces the text in the specified {}


String str = "Java The present history of development is based on {0} years -{1} years ";
String[][] object={new String[]{"\{0\}","1995"},new String[]{"\{1\}","2007"}};
System.out.println(replace(str,object));

public static String replace(final String sourceString,Object[] object) {
      String temp=sourceString;  
      for(int i=0;i<object.length;i++){
           String[] result=(String[])object[i];
        Pattern  pattern = Pattern.compile(result[0]);
        Matcher matcher = pattern.matcher(temp);
        temp=matcher.replaceAll(result[1]);
      }
      return temp;
}


In pieces query the specified directory with the regular condition


 //Used to cache the list of files
    private ArrayList files = new ArrayList();
    //Used to host the file path
    private String _path;
    //Used to carry unmerged regular formulas
    private String _regexp;
    
    class MyFileFilter implements FileFilter {

       
       public boolean accept(File file) {
        try {
         Pattern pattern = Pattern.compile(_regexp);
         Matcher match = pattern.matcher(file.getName());        
         return match.matches();
        } catch (Exception e) {
         return true;
        }
       }
      }
    
    
    FilesAnalyze (String path,String regexp){
      getFileName(path,regexp);
    }
    
    
    private void getFileName(String path,String regexp) {
      //directory
       _path=path;
       _regexp=regexp;
       File directory = new File(_path);
       File[] filesFile = directory.listFiles(new MyFileFilter());
       if (filesFile == null) return;
       for (int j = 0; j < filesFile.length; j++) {
        files.add(filesFile[j]);
       }
       return;
      }
  
    
    public void print (PrintStream out) {
      Iterator elements = files.iterator();
      while (elements.hasNext()) {
        File file=(File) elements.next();
          out.println(file.getPath());  
      }
    }

    public static void output(String path,String regexp) {

      FilesAnalyze fileGroup1 = new FilesAnalyze(path,regexp);
      fileGroup1.print(System.out);
    }
  
    public static void main (String[] args) {
      output("C:\","[A-z|.]*");
    }


Related articles: