JAVA Basic Use of Regular Expressions Tutorial
- 2021-09-05 00:02:43
- OfStack
Regular expression syntax
The simplest regular expression is a string. For example, hello world is also a regular expression, matching the string "hello world". On this basis, we add other symbols to satisfy that we match a well-formed string instead of a string like a regular expression. These symbols can be roughly divided into [], {}, (), while other symbols (such as., +, *,\\ d, etc.) can actually be regarded as their abbreviation.
[]
[] is a match for any 1 character contained in [], such as [abc] is a match for a or b or c.
You can use-to represent a 1-character interval, [a-c] and [abc] are equivalent; You can write multiple intervals at the same time, or add characters after writing intervals. For example, [a-z0-9A] means matching any 1 character in a-z or 0-9 or A;
You can use ^ to take the complement set, i.e. [^ a-c] to match any character other than abc, [^ 0-9] to match any character other than numbers, and [^ a] to match any character other than a;
1 Some abbreviations equivalent to []:
简写 | 意义 |
---|---|
. | 匹配除"\r\n"之外的任何单个字符。 |
\d | 数字字符匹配。等效于 [0-9]。 |
\D | 非数字字符匹配。等效于 [^0-9]。 |
\s | 匹配任何空白字符,包括空格、制表符、换页符等。与 [ \f\n\r\t\v] 等效。 |
\S | 匹配任何非空白字符。与 [^ \f\n\r\t\v] 等效。 |
\w | 匹配任何字类字符,包括下划线。与[A-Za-z0-9_]等效。 |
\W | 与任何非单词字符匹配。与[^A-Za-z0-9_]等效。 |
In addition, since **\\ ** in java represents 1\ in other languages, the above ones need to be written as\\ d,\\ D and so on in java.
{}
{} Indicates the number of matches to the preceding character or subexpression.
表达式 | 意义 |
---|---|
{n} | n 是非负整数。正好匹配 n 次。如o{2}匹配两次o |
{n,} | n 是非负整数。至少匹配 n 次。 |
{n,m} | m 和 n 是非负整数,其中 n <= m。匹配至少 n 次,至多 m 次。 |
表达式 | 意义 |
---|---|
* | 零次或多次匹配前面的字符或子表达式,等效于 {0,}。 |
+ | 1次或多次匹配前面的字符或子表达式, 等效于 {1,}。 |
? | 零次或1次匹配前面的字符或子表达式, 等效于 {0,1}。 |
()
() represents a capture group, so you can use () to split an expression into multiple groups and extract the required information from a string. Begin to join at 1 in ()? < name > Groups can be named, which makes it easier to extract information.
Such as (? [A-Za-z] +) represents a person name consisting of at least one letter, and only matcher. group ("name") is needed to get the matching result when obtaining the matched person name.
JAVA writing
Template
String pattern = "[a-z]+";// Regular expression
Pattern r = Pattern.compile(pattern);// Compile an expression
Matcher matcher = r.matcher(text);// Will text String as the matching string
matcher.find();// Matching
value1 = matcher.group("value1");// Extract information
Example
Describe
Depending on the time accuracy, there may be the following four correct mail message formats:
username@domain-yyyy-mm-dd
Example: lethean@buaa.edu.cn-2020-12-02
username@domain-yyyy-mm-dd-hh
Example: myname-lethean@buaa.edu.cn-2020-12-02-15
username@domain-yyyy-mm-dd-hh:mimi
Example: Lethean@buaa.edu.cn-2020-12-02-15: 01
username@domain-yyyy-mm-dd-hh:mimi:ss
Example: myname-lethean@buaa.edu.cn-2020-12-20-15: 01:20
Among them
username @ domain is
The mailbox address of the sender of the message
username is the user name and domain is the domain name
yyyy-mm-dd/yyyy-mm-dd-hh/yyyy-mm-dd-hh: mimi/yyyy-mm-dd-hh: mimi: ss is
Sending time
'y' represents a 1-digit year number, 'm' represents a 1-digit month number, 'd' represents a 1-digit date number, 'h' represents an hour number, 'mi' represents a minute number, and 's' represents a second number
username is a non-zero-length string that contains only upper and lower case letters and is case-insensitive.
domain is a non-zero length string that contains only upper and lower case letters, numbers,. and is case sensitive.
Writing style
String pattern = "(?<username>[A-Za-z-]+)@(?<domain>[A-Za-z0-9.]+)-(?<yyyy>\\d{4})-(?<mm>\\d{2})-(?<dd>\\d{2})(-)?(?<hh>\\d{2})?(:)?(?<mimi>\\d{2})?(:)?(?<ss>\\d{2})?";
String text = "myname--lethean@buaa.edu.cn-2020-12-20-15:01:20";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(text);
if (matcher.find()) {
System.out.println("username:"+matcher.group("username"));
System.out.println("domain:"+matcher.group("domain"));
System.out.println("yyyy:"+matcher.group("yyyy"));
System.out.println("mm:"+matcher.group("mm"));
System.out.println("dd:"+matcher.group("dd"));
System.out.println("hh:"+matcher.group("hh"));
System.out.println("mimi:"+matcher.group("mimi"));
System.out.println("ss:"+matcher.group("ss"));
}
The correct results can be obtained by replacing text with four mailboxes, and those that do not exist (hh, mimi and ss may not exist) are null.
The running results are as follows:
username:myname--lethean
domain:buaa.edu.cn
yyyy:2020
mm:12
dd:20
hh:15
mimi:01
ss:20
If you enter in format 3, when text is Lethean@buaa.edu.cn-2020-12-02-15:01, the output is as follows:
username:Lethean
domain:buaa.edu.cn
yyyy:2020
mm:12
dd:02
hh:15
mimi:01
ss:null
ss is empty because it is not in this format.
Example modification
Describe
The mail message input format is changed to: (ss: mimi: hh-) dd-mm-yyyy-username @ domain-place
Depending on the time accuracy, there may be the following four correct mail message formats:
dd-mm-yyyy-username@domain-place
Example: 02-12-2020-abc@buaa.edu.cn-Wuhu
hh-dd-mm-yyyy-username@domain-place
Example: 03-02-12-2020-abc@buaa.edu.cn-wuhu
mimi:hh-dd-mm-yyyy-username@domain-place
Example: 00: 03-02-12-2020-abc@buaa.edu.cn-Wuhu
ss:mimi:hh-dd-mm-yyyy-username@domain-place
Example: 01: 00: 03-02-12-2020-abc @ buaa.edu.cn-wuhu
place is a newly added field, which represents a place. It is composed of English letters and is sensitive to case, that is, Beijing and beijing are regarded as different places
Writing style
String pattern = "(((?<ss>\\d{2}):)?((?<mimi>\\d{2}):))?((?<hh>\\d{2})-)?(?<dd>\\d{2})-(?<mm>\\d{2})-(?<yyyy>\\d{4})-(?<username>[A-Za-z-]+)@(?<domain>[A-Za-z0-9.]+)-(?<place>[A-Za-z]+)";
String text = "01:11:03-02-12-2020-abc@buaa.edu.cn-wuhu";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(text);
if (matcher.find()) {
System.out.println("username:"+matcher.group("username"));
System.out.println("domain:"+matcher.group("domain"));
System.out.println("yyyy:"+matcher.group("yyyy"));
System.out.println("mm:"+matcher.group("mm"));
System.out.println("dd:"+matcher.group("dd"));
System.out.println("hh:"+matcher.group("hh"));
System.out.println("mimi:"+matcher.group("mimi"));
System.out.println("ss:"+matcher.group("ss"));
System.out.println("place:"+matcher.group("place"));
}
Pay attention to the front ((? < ss > \\d{2}):)?((? < mimi > \\ d {2}):))? Must be nested, or the error of matching mimi to ss will occur when matching (because the matching format is the same, ss is matched first, but this will not happen after nesting).
The running results are as follows:
username:abc
domain:buaa.edu.cn
yyyy:2020
mm:12
dd:02
hh:03
mimi:11
ss:01
place:wuhu
If you change text to 11: 03-02-12-2020-abc @ buaa. edu. cn-wuhu, the output is as follows, and ss does not match.
username:abc
domain:buaa.edu.cn
yyyy:2020
mm:12
dd:02
hh:03
mimi:11
ss:null
place:wuhu