In depth analysis of regular expressions in Java

  • 2020-04-01 01:43:37
  • OfStack

Regex (regular expression) : RegularExpressions (instead of StringTokenizer); String processing edge; Popular on Unix, perl is better off using regex.
Mainly used in string matching, lookup, and substitution. For example, matching IP (range less than 256) is easy to do with regularization; Pull out a large number of email addresses from web pages to send spam; Pull links from web pages. Contains Matcher (the result of matching a string with a pattern) and pattern.


 
         System.out.println("abc".matches("..."));//Each "." represents a character


 
         System.out.println("ab54564654sbg48746bshj".replaceAll("[0-9]", "-"));//Each "." represents a character

Second,


         Pattern p = Pattern.compile("[a-z]{3}");
         Matcher m = p.matcher("ggs");//Creates an matcher that matches the given input to this pattern. Inside is actually creating a priority state automaton (compilation principle)
         //The String to match in matcher and matches is actually a CharSequence (interface), but String implements that interface and is polymorphic
         System.out.println(m.matches());//If it's "GGSS" it doesn't match
         //Matches ("[a-z]{3}"), but the above has the advantage of being at least more efficient, and Pattern and Matcher provide a lot of functionality

3. "Meta Character" in regex ". * + "; CTRL + shift + "/" for comment and "\" for uncomment.

"a".matches(".");//True,"." means any character, even Chinese characters
         "aa".matches("aa");//True, that is, a regular string can also be used as a regular expression
         
         "aaaa".matches("a*");
         "".matches("a*");//true
         "aaa".matches("a?");//True once or zero times
         "".matches("a?");//true
         "a".matches("a?");//true
         "544848154564113".matches("\d{3,100}");//true
         //This is the simplest IP judgment, but if more than 255 you will not be able to judge
         "192.168.0.aaa".matches("\d{1,3}\.\d{1,3}\.\d{1,3}\d{1,3}");
         "192".matches("[0-2][0-9][0-9]");

Iv. [ABC] means matching any character; [^ ABC] represents a letter other than ABC (must still be a letter, or false if the string is empty); [a-za-z] is equivalent to "[a-z]|[a-z] "; [A-Z&&[ABS]] means any capital letter of ABS.

//It turns out that there's no difference between | and | and |, and ampersand and ampersand, and I don't know if that's the right way to think about it
         System.out.println("C".matches("[A-Z&&[ABS]]"));//false
         System.out.println("C".matches("[A-Z&[ABS]]"));//true
         System.out.println("A".matches("[A-Z&&[ABS]]"));//true
         System.out.println("A".matches("[A-Z&[ABS]]"));//true
         System.out.println("C".matches("[A-Z|[ABS]]"));//true
         System.out.println("C".matches("[A-Z||[ABS]]"));//true

5. \w word character: [a-za-z_0-9] for user name matching; \s white space character: [\t\n\x0B\f\r]; \S non-white space character: [^\ S]; \W non-word character: [^\ W].

" ntr".matches("\s{4}");//true
         " ".matches("\S");//false
         "a_8".matches("\w{3}");//true
         //"+" means once or more times
         "abc888&^%".matches("[a-z]{1,3}\d+[&^#%]+");//true
         
         System.out.println("\".matches("\\"));//true

POSIX character class (us-ascii only)

 p{Lower}  Lowercase alphabetic characters: [a-z] ;p{Upper}  Capital letter characters: [A-Z] ;p{ASCII}  all  ASCII : [x00-x7F] ;p{Alpha}  Alphabetic characters: [p{Lower}p{Upper}] ;p{Digit}  Decimal digits: [0-9]  . 

Boundary matcher
^ the beginning of the line
End of $line
\b word boundary
\B nonword boundary
\A at the beginning of the input
A matching ending on \G
\ the end of the Z input, used only for the last terminator (if any)
End of \z input

"hello world".matches("^h.*");//^ the beginning of the line
         "hello world".matches(".*ld$");//End of $line
         "hello world".matches("^h[a-z]{1,3}o\b.*");//b word boundary
         "helloworld".matches("^h[a-z]{1,3}o\b.*");
 " n".matches("^[\s&&[^\n]]*\n$");//Determines a blank line. A blank line begins with a blank character

8. You can also use m.sitart () and m.etnd () under the find method to return the next starting and ending position; If you can't find it, you make a mistake.

Pattern p = Pattern.compile("\d{3,5}");
         String s = "133-34444-333-00";
         Matcher m = p.matcher(s);
         m.matches();//Matches matches all strings
         m.reset();
         
         m.find();
         m.find();
         m.find();//Try to find the next child of the input sequence that matches the pattern
         m.find();
         
         m.lookingAt();
         m.lookingAt();
         m.lookingAt();
         m.lookingAt();

9. String substitution

import java.util.regex.Matcher;
 import java.util.regex.Pattern;

 public class TestRegexReplacement {

     public static void main(String[] args) {

         Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);//The following parameter is "case insensitive", which means "case insensitive".
         Matcher m = p.matcher("Java java hxsyl Ilovejava java JaVaAcmer");
         while(m.find()) {
             System.out.println(m.group());//M.group outputs all Java (ignoring case)

         }

         
         String s = m.replaceAll("Java");//String also has this method
         System.out.println(s);

         m.reset();//Be sure to add, because find and matcher interact
         StringBuffer sb = new StringBuffer();
         int i = 0;
         
         while(m.find()) {
             i++;
             //I &1 cannot be directly written as a Boolean
             if((i&1)==1) {
                 m.appendReplacement(sb, "Java");
             }else {
                 m.appendReplacement(sb, "java");
             }
         }

         m.appendTail(sb);//Add the remaining string after the last Java found
         System.out.println(sb);//Only Acmer is output without reset
     }
 }

Ten, grouping


         Pattern p = Pattern.compile("(\d{3,5})([a-z]{2})");
         String s = "123aaa-77878bb-646dd-00";
         Matcher m = p.matcher(s);
         while(m.find()) {
             System.out.println(m.group());
             System.out.println(m.group(1));//Output the matching Numbers for each pair
             System.out.println(m.group(2));//Output each matching pair of letters
         }

Eleven, crawl the web in the email

import java.io.BufferedReader;
 import java.io.FileNotFoundException;
 import java.io.FileReader;
 import java.io.IOException;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;

 
 public class EmailSpider {

     public static void main(String[] args) {
         // TODO Auto-generated method stub
         try {
             BufferedReader br = new BufferedReader(new FileReader("F:\regex.html"));
             String line = "";
             try {
                 while((line=br.readLine())!=null) {
                     solve(line);
                 }
             } catch (IOException e) {
                 // TODO Auto-generated catch block
                 e.printStackTrace();
             }

         } catch (FileNotFoundException e) {
             // TODO Auto-generated catch block
             e.printStackTrace();
         }

 
     }

     private static void solve(String line) {
         // TODO Auto-generated method stub
         //A regular expression can't go wrong if it doesn't do its job, because it's a string
         Pattern p = Pattern.compile("[\w[.-]]+@[\w[.-]]+\.[\w]+");
         Matcher m = p.matcher(line);

         while(m.find()) {
             System.out.println(m.group());
         }

     }

 }

Code statistics

View Code 
 
 import java.io.BufferedReader;
 import java.io.File;
 import java.io.FileNotFoundException;
 import java.io.FileReader;
 import java.io.IOException;

 public class CoderCount {

     static long normalLines = 0;
     static long commentLines = 0;
     static long whiteLines = 0;

     public static void main(String[] args) {
         File f = new File("D:\share\src");
         File[] codeFiles = f.listFiles();
         for(File child : codeFiles){
             if(child.getName().matches(".*\.java$")) {
                 solve(child);
             }
         }

         System.out.println("normalLines:" + normalLines);
         System.out.println("commentLines:" + commentLines);
         System.out.println("whiteLines:" + whiteLines);

     }

     private static void solve(File f) {
         BufferedReader br = null;
         boolean comment = false;
         try {
             br = new BufferedReader(new FileReader(f));
             String line = "";
             while((line = br.readLine()) != null) {
                 /*
                  * //Some comment lines have a TAB in front of them
                  *  Not written in the book readLine after 
                  *  The last line will have a null pointer 
                  */
                 line = line.trim();
                 //After readLine has read the string, the newline is removed
                 if(line.matches("^[\s&&[^\n]]*$")) {
                     whiteLines ++;
                 } else if (line.startsWith("")) {
                     commentLines ++;
                     comment = true;    
                 } else if (line.startsWith("")) {
                     commentLines ++;
                 } else if (true == comment) {
                     commentLines ++;
                     if(line.endsWith("*/")) {
                         comment = false;
                     }
                 } else if (line.startsWith("//")) {
                     commentLines ++;
                 } else {
                     normalLines ++;
                 }
             }
         } catch (FileNotFoundException e) {
             e.printStackTrace();
         } catch (IOException e) {
             e.printStackTrace();
         } finally {
             if(br != null) {
                 try {
                     br.close();
                     br = null;
                 } catch (IOException e) {
                     e.printStackTrace();
                 }
             }
         }
     }

 }

13, Quantifiers
Including? * +; By default, all Greedy and Reluctant and Possessive (exclusive).

//The grouping was added to get a better view
     Pattern p = Pattern.compile("(.{3,10})+[0-9]");
     String s = "aaaa5bbbb6";//The length is ten
     Matcher m = p.matcher(s);
     
     if(m.find()) {
         System.out.println(m.start() + "----" + m.end());
     }else {
         System.put.println("Not match!");
     }

Xiv. Supplement (non-capture group)

//The meaning of a non-capture group is the opposite of the literal meaning, which means capture if it matches
     Pattern p = Pattern.compile("(?=a).{3}");
     
     String s = "444a66b";
     Matcher m = p.matcher(s);
     while(m.find()) {
         System.out.println(m.group());
     }

Back Reference

Pattern p = Pattern.compile("(\d\d)\1");
     
     String s = "1212";
     Matcher m = p.matcher(s);
     System.out.println(m.matches());

16. Short for flags
"." does not match newline, just remember CASE_INSENSITIVE, shorthand for "by embedded flag expression & PI; (? I) case-insensitive matching can also be enabled ".


Related articles: