Detail the Pattern class and Matcher class in Java regular expressions

  • 2020-05-24 05:39:32
  • OfStack

preface

This article introduces the Pattern and Matcher classes in Java regular expressions. First let's make it clear that the regular expression specified as a string must first be compiled as an instance of the pattern class. Therefore, how to better understand these two classes is a programmer must know.

Here's a look at each of these classes:

1. Concept of capture group

A capture group can be numbered by calculating its opening parenthesis from left to right, starting at 1. For example, in the expression ((A)(B(C)), there are four such groups:


1  ((A)(B(C)))
2  (A)
3  (B(C))
4  (C)

Group zero always represents the entire expression. In (the & # 63;) The first group is a pure non-capture group, which does not capture text and does not count against the combinator.

The capture input associated with a group is always the subsequence that most recently matches the group. If the group is evaluated again due to quantization, the previously captured values (if any) will be retained on the second failure, for example, by combining the string "aba" with the expression (a(b)?) + matches, and the second group is set to "b". At the beginning of each match, all captured input is discarded.

2. Details of Pattern class and Matcher class

The java regular expression is implemented through the Pattern class under the java.util.regex package and the Matcher class. (it is recommended that when reading this article, you open the java API documentation and look up the method description in java API when it comes to which method is introduced.)

The Pattern class is used to create a regular expression, or a matching pattern, whose constructor is private and cannot be created directly, but can be passed Pattern.complie(String regex) Simple factory method creates a regular expression,

Java code example:


Pattern p=Pattern.compile("\\w+"); 
p.pattern();// return  \w+ 

pattern() Returns the string form of a regular expression Pattern.complile(String regex) regex parameters

1.Pattern.split(CharSequence input)

Pattern has one split(CharSequence input) Method to separate strings and return 1 String[], I guess String.split(String regex) It is through Pattern.split(CharSequence input) To make it happen.

Java code example:


Pattern p=Pattern.compile("\\d+"); 
String[] str=p.split(" my QQ is :456456 My phone number is :0532214 My email is :aaa@aaa.com"); 

Result :str[0]=" my QQ is :" str[1]=" my phone number is :" str[2]=" my email is :aaa@aaa.com"

2. Pattern.matcher (String regex,CharSequence input) is a static method used to quickly match strings. This method is suitable for matching all strings only once.

Java code example:


Pattern.matches("\\d+","2223");// return true 
Pattern.matches("\\d+","2223aa");// return false, All strings need to be matched to return true, Here, aa Can't match to  
Pattern.matches("\\d+","22bb23");// return false, All strings need to be matched to return true, Here, bb Can't match to  

3.Pattern.matcher(CharSequence input)

Having said that, it's finally time for the Matcher class, Pattern.matcher(CharSequence input) Returns an Matcher object.
The constructor of the Matcher class is also private and cannot be created at will, but can only be passed Pattern.matcher(CharSequence input) Method to get an instance of that class.
The Pattern class can only do 1 simple matching operation, but for a stronger and more convenient regular matching operation, Pattern needs to cooperate with Matcher1. The Matcher class provides grouping support for regular expressions and multiple matching support for regular expressions.

Java code example:


Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.pattern();// return p  So it's going to return that Matcher Which object is made of Pattern object  

4.Matcher.matches()/ Matcher.lookingAt()/ Matcher.find()

The Matcher class provides three matching operation methods, all of which return boolean type. When a match occurs, true is returned, and false is returned if no match occurs

matches() Matches the entire string, returning true only if the entire string matches

Java code example:


Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.matches();// return false, because bb Can't be \d+ matching , The entire string match failed . 
Matcher m2=p.matcher("2223"); 
m2.matches();// return true, because \d+ Matches the entire string 

So let's go back to 1 Pattern.matcher(String regex,CharSequence input) , which is equivalent to the following code
pattern() 0

lookingAt() The preceding string is matched, and true is returned only if the string is first matched

Java code example:


Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.lookingAt();// return true, because \d+ It matches the previous one 22 
Matcher m2=p.matcher("aa2223"); 
m2.lookingAt();// return false, because \d+ It doesn't match the previous one aa 

find() The string can be matched anywhere.

Java code example:


Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.find();// return true 
Matcher m2=p.matcher("aa2223"); 
m2.find();// return true 
Matcher m3=p.matcher("aa2223bb"); 
m3.find();// return true 
Matcher m4=p.matcher("aabb"); 
m4.find();// return false 

5.Mathcer.start()/ Matcher.end()/ Matcher.group()

When using matches() , lookingAt() , find() After performing the match operation, you can use the three methods above to get more detailed information.

start() Returns the index position of the matched substring in the string.

end() Returns the index position of the last character of the matched substring in the string.

group() Returns the substring matched to

Java code example:


Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("aaa2223bb"); 
m.find();// matching 2223 
m.start();// return 3 
m.end();// return 7, return 的是2223 After the index number  
m.group();// return 2223 

Mathcer m2=m.matcher("2223bb"); 
m.lookingAt(); // matching 2223 
m.start(); // return 0, Due to the lookingAt() Only the preceding string can be matched , So when you use lookingAt() When the match ,start() Method always returns 0 
m.end(); // return 4 
m.group(); // return 2223 

Matcher m3=m.matcher("2223bb"); 
m.matches(); // Match the entire string  
m.start(); // return 0, And the reason is clear  
m.end(); // return 6, And the reason is clear , because matches() All strings need to match  
m.group(); // return 2223bb 

With that said, I'm sure you all understand the use of the above methods, so let's talk about how regular expression grouping is used in java.
start() , end() , group() They both have 1 overloaded method and they are start(int i) , end(int i) , group(int i) Dedicated to group operations, the Mathcer class has one more groupCount() Used to return how many groups there are.

Java code example:


Pattern p=Pattern.compile("([a-z]+)(\\d+)"); 
Matcher m=p.matcher("aaa2223bb"); 
m.find(); // matching aaa2223 
m.groupCount(); // return 2, Because there are 2 group  
m.start(1); // return 0  Returns the first 1 The index number of the substring to which the group matches  
m.start(2); // return 3 
m.end(1); // return 3  Returns the first 1 The end of the substring to which the group matches 1 The index position of a character in a string . 
m.end(2); // return 7 
m.group(1); // return aaa, Returns the first 1 The substring to which the group matches  
m.group(2); // return 2223, Returns the first 2 The substring to which the group matches  

Now we use a slightly more advanced regular matching operation of 1, for example, we have a piece of text with a lot of Numbers in it, and the Numbers are separated, we now want to pull out all the Numbers in the text, using the regular operation of java is so simple.

Java code example:


Pattern p=Pattern.compile("\\w+"); 
p.pattern();// return  \w+ 
0

Output:


Pattern p=Pattern.compile("\\w+"); 
p.pattern();// return  \w+ 
1

If the above while() Replace the loop with


Pattern p=Pattern.compile("\\w+"); 
p.pattern();// return  \w+ 
2

The output:


456456 
start:6 end:12 
0532214 
start:19 end:26 
123 
start:36 end:39 

Now you should know that every time you do a match start() , end() , group() The values of all three methods will change to match the information of the substring and their overloading methods will change to the corresponding information.

Note: it can only be used if the match operation is successful start() , end() , group() Three methods, otherwise thrown java.lang.IllegalStateException Which is when matches() , lookingAt() , find() Any one of these methods is only available when it returns true.

conclusion


Related articles: