Detail the Pattern class and Matcher class in Java regular expressions
- 2020-05-24 05:39:32
- OfStack
preface
This article introduces the Pattern and Matcher classes in Java regular expressions. First let's make it clear that the regular expression specified as a string must first be compiled as an instance of the pattern class. Therefore, how to better understand these two classes is a programmer must know.
Here's a look at each of these classes:
1. Concept of capture group
A capture group can be numbered by calculating its opening parenthesis from left to right, starting at 1. For example, in the expression ((A)(B(C)), there are four such groups:
1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)
Group zero always represents the entire expression. In (the & # 63;) The first group is a pure non-capture group, which does not capture text and does not count against the combinator.
The capture input associated with a group is always the subsequence that most recently matches the group. If the group is evaluated again due to quantization, the previously captured values (if any) will be retained on the second failure, for example, by combining the string "aba" with the expression (a(b)?) + matches, and the second group is set to "b". At the beginning of each match, all captured input is discarded.
2. Details of Pattern class and Matcher class
The java regular expression is implemented through the Pattern class under the java.util.regex package and the Matcher class. (it is recommended that when reading this article, you open the java API documentation and look up the method description in java API when it comes to which method is introduced.)
The Pattern class is used to create a regular expression, or a matching pattern, whose constructor is private and cannot be created directly, but can be passed
Pattern.complie(String regex)
Simple factory method creates a regular expression,
Java code example:
Pattern p=Pattern.compile("\\w+");
p.pattern();// return \w+
pattern()
Returns the string form of a regular expression
Pattern.complile(String regex)
regex parameters
1.Pattern.split(CharSequence input)
Pattern has one
split(CharSequence input)
Method to separate strings and return 1 String[], I guess
String.split(String regex)
It is through
Pattern.split(CharSequence input)
To make it happen.
Java code example:
Pattern p=Pattern.compile("\\d+");
String[] str=p.split(" my QQ is :456456 My phone number is :0532214 My email is :aaa@aaa.com");
Result :str[0]=" my QQ is :" str[1]=" my phone number is :" str[2]=" my email is :aaa@aaa.com"
2. Pattern.matcher (String regex,CharSequence input) is a static method used to quickly match strings. This method is suitable for matching all strings only once.
Java code example:
Pattern.matches("\\d+","2223");// return true
Pattern.matches("\\d+","2223aa");// return false, All strings need to be matched to return true, Here, aa Can't match to
Pattern.matches("\\d+","22bb23");// return false, All strings need to be matched to return true, Here, bb Can't match to
3.Pattern.matcher(CharSequence input)
Having said that, it's finally time for the Matcher class,
Pattern.matcher(CharSequence input)
Returns an Matcher object.
The constructor of the Matcher class is also private and cannot be created at will, but can only be passed
Pattern.matcher(CharSequence input)
Method to get an instance of that class.
The Pattern class can only do 1 simple matching operation, but for a stronger and more convenient regular matching operation, Pattern needs to cooperate with Matcher1. The Matcher class provides grouping support for regular expressions and multiple matching support for regular expressions.
Java code example:
Pattern p=Pattern.compile("\\d+");
Matcher m=p.matcher("22bb23");
m.pattern();// return p So it's going to return that Matcher Which object is made of Pattern object
4.Matcher.matches()/ Matcher.lookingAt()/ Matcher.find()
The Matcher class provides three matching operation methods, all of which return boolean type. When a match occurs, true is returned, and false is returned if no match occurs
matches()
Matches the entire string, returning true only if the entire string matches
Java code example:
Pattern p=Pattern.compile("\\d+");
Matcher m=p.matcher("22bb23");
m.matches();// return false, because bb Can't be \d+ matching , The entire string match failed .
Matcher m2=p.matcher("2223");
m2.matches();// return true, because \d+ Matches the entire string
So let's go back to 1
Pattern.matcher(String regex,CharSequence input)
, which is equivalent to the following code
pattern()
0
lookingAt()
The preceding string is matched, and true is returned only if the string is first matched
Java code example:
Pattern p=Pattern.compile("\\d+");
Matcher m=p.matcher("22bb23");
m.lookingAt();// return true, because \d+ It matches the previous one 22
Matcher m2=p.matcher("aa2223");
m2.lookingAt();// return false, because \d+ It doesn't match the previous one aa
find()
The string can be matched anywhere.
Java code example:
Pattern p=Pattern.compile("\\d+");
Matcher m=p.matcher("22bb23");
m.find();// return true
Matcher m2=p.matcher("aa2223");
m2.find();// return true
Matcher m3=p.matcher("aa2223bb");
m3.find();// return true
Matcher m4=p.matcher("aabb");
m4.find();// return false
5.Mathcer.start()/ Matcher.end()/ Matcher.group()
When using
matches()
,
lookingAt()
,
find()
After performing the match operation, you can use the three methods above to get more detailed information.
start()
Returns the index position of the matched substring in the string.
end()
Returns the index position of the last character of the matched substring in the string.
group()
Returns the substring matched to
Java code example:
Pattern p=Pattern.compile("\\d+");
Matcher m=p.matcher("aaa2223bb");
m.find();// matching 2223
m.start();// return 3
m.end();// return 7, return 的是2223 After the index number
m.group();// return 2223
Mathcer m2=m.matcher("2223bb");
m.lookingAt(); // matching 2223
m.start(); // return 0, Due to the lookingAt() Only the preceding string can be matched , So when you use lookingAt() When the match ,start() Method always returns 0
m.end(); // return 4
m.group(); // return 2223
Matcher m3=m.matcher("2223bb");
m.matches(); // Match the entire string
m.start(); // return 0, And the reason is clear
m.end(); // return 6, And the reason is clear , because matches() All strings need to match
m.group(); // return 2223bb
With that said, I'm sure you all understand the use of the above methods, so let's talk about how regular expression grouping is used in java.
start()
,
end()
,
group()
They both have 1 overloaded method and they are
start(int i)
,
end(int i)
,
group(int i)
Dedicated to group operations, the Mathcer class has one more
groupCount()
Used to return how many groups there are.
Java code example:
Pattern p=Pattern.compile("([a-z]+)(\\d+)");
Matcher m=p.matcher("aaa2223bb");
m.find(); // matching aaa2223
m.groupCount(); // return 2, Because there are 2 group
m.start(1); // return 0 Returns the first 1 The index number of the substring to which the group matches
m.start(2); // return 3
m.end(1); // return 3 Returns the first 1 The end of the substring to which the group matches 1 The index position of a character in a string .
m.end(2); // return 7
m.group(1); // return aaa, Returns the first 1 The substring to which the group matches
m.group(2); // return 2223, Returns the first 2 The substring to which the group matches
Now we use a slightly more advanced regular matching operation of 1, for example, we have a piece of text with a lot of Numbers in it, and the Numbers are separated, we now want to pull out all the Numbers in the text, using the regular operation of java is so simple.
Java code example:
Pattern p=Pattern.compile("\\w+");
p.pattern();// return \w+
0
Output:
Pattern p=Pattern.compile("\\w+");
p.pattern();// return \w+
1
If the above
while()
Replace the loop with
Pattern p=Pattern.compile("\\w+");
p.pattern();// return \w+
2
The output:
456456
start:6 end:12
0532214
start:19 end:26
123
start:36 end:39
Now you should know that every time you do a match
start()
,
end()
,
group()
The values of all three methods will change to match the information of the substring and their overloading methods will change to the corresponding information.
Note: it can only be used if the match operation is successful
start()
,
end()
,
group()
Three methods, otherwise thrown
java.lang.IllegalStateException
Which is when
matches()
,
lookingAt()
,
find()
Any one of these methods is only available when it returns true.
conclusion