JAVA Basic Use of Regular Expressions Tutorial

  • 2021-09-05 00:02:43
  • OfStack

Regular expression syntax

The simplest regular expression is a string. For example, hello world is also a regular expression, matching the string "hello world". On this basis, we add other symbols to satisfy that we match a well-formed string instead of a string like a regular expression. These symbols can be roughly divided into [], {}, (), while other symbols (such as., +, *,\\ d, etc.) can actually be regarded as their abbreviation.

[]

[] is a match for any 1 character contained in [], such as [abc] is a match for a or b or c.

You can use-to represent a 1-character interval, [a-c] and [abc] are equivalent; You can write multiple intervals at the same time, or add characters after writing intervals. For example, [a-z0-9A] means matching any 1 character in a-z or 0-9 or A;

You can use ^ to take the complement set, i.e. [^ a-c] to match any character other than abc, [^ 0-9] to match any character other than numbers, and [^ a] to match any character other than a;

1 Some abbreviations equivalent to []:

简写 意义
. 匹配除"\r\n"之外的任何单个字符。
\d 数字字符匹配。等效于 [0-9]。
\D 非数字字符匹配。等效于 [^0-9]。
\s 匹配任何空白字符,包括空格、制表符、换页符等。与 [ \f\n\r\t\v] 等效。
\S 匹配任何非空白字符。与 [^ \f\n\r\t\v] 等效。
\w 匹配任何字类字符,包括下划线。与[A-Za-z0-9_]等效。
\W 与任何非单词字符匹配。与[^A-Za-z0-9_]等效。

In addition, since **\\ ** in java represents 1\ in other languages, the above ones need to be written as\\ d,\\ D and so on in java.

{}

{} Indicates the number of matches to the preceding character or subexpression.

表达式 意义
{n} n 是非负整数。正好匹配 n 次。如o{2}匹配两次o
{n,} n 是非负整数。至少匹配 n 次。
{n,m} mn 是非负整数,其中 n <= m。匹配至少 n 次,至多 m 次。
表达式 意义
* 零次或多次匹配前面的字符或子表达式,等效于 {0,}。
+ 1次或多次匹配前面的字符或子表达式, 等效于 {1,}。
? 零次或1次匹配前面的字符或子表达式, 等效于 {0,1}。

()

() represents a capture group, so you can use () to split an expression into multiple groups and extract the required information from a string. Begin to join at 1 in ()? < name > Groups can be named, which makes it easier to extract information.

Such as (? [A-Za-z] +) represents a person name consisting of at least one letter, and only matcher. group ("name") is needed to get the matching result when obtaining the matched person name.

JAVA writing

Template


String pattern = "[a-z]+";// Regular expression 
Pattern r = Pattern.compile(pattern);// Compile an expression 
Matcher matcher = r.matcher(text);// Will text String as the matching string 
matcher.find();// Matching 
value1 = matcher.group("value1");// Extract information 

Example

Describe

Depending on the time accuracy, there may be the following four correct mail message formats:

username@domain-yyyy-mm-dd

Example: lethean@buaa.edu.cn-2020-12-02

username@domain-yyyy-mm-dd-hh

Example: myname-lethean@buaa.edu.cn-2020-12-02-15

username@domain-yyyy-mm-dd-hh:mimi

Example: Lethean@buaa.edu.cn-2020-12-02-15: 01

username@domain-yyyy-mm-dd-hh:mimi:ss

Example: myname-lethean@buaa.edu.cn-2020-12-20-15: 01:20

Among them

username @ domain is

The mailbox address of the sender of the message

username is the user name and domain is the domain name
yyyy-mm-dd/yyyy-mm-dd-hh/yyyy-mm-dd-hh: mimi/yyyy-mm-dd-hh: mimi: ss is

Sending time

'y' represents a 1-digit year number, 'm' represents a 1-digit month number, 'd' represents a 1-digit date number, 'h' represents an hour number, 'mi' represents a minute number, and 's' represents a second number
username is a non-zero-length string that contains only upper and lower case letters and is case-insensitive.

domain is a non-zero length string that contains only upper and lower case letters, numbers,. and is case sensitive.

Writing style


String pattern = "(?<username>[A-Za-z-]+)@(?<domain>[A-Za-z0-9.]+)-(?<yyyy>\\d{4})-(?<mm>\\d{2})-(?<dd>\\d{2})(-)?(?<hh>\\d{2})?(:)?(?<mimi>\\d{2})?(:)?(?<ss>\\d{2})?";
  String text = "myname--lethean@buaa.edu.cn-2020-12-20-15:01:20";
  Pattern r = Pattern.compile(pattern);
  Matcher matcher = r.matcher(text);
  if (matcher.find()) {
   System.out.println("username:"+matcher.group("username"));
   System.out.println("domain:"+matcher.group("domain"));
   System.out.println("yyyy:"+matcher.group("yyyy"));
   System.out.println("mm:"+matcher.group("mm"));
   System.out.println("dd:"+matcher.group("dd"));
   System.out.println("hh:"+matcher.group("hh"));
   System.out.println("mimi:"+matcher.group("mimi"));
   System.out.println("ss:"+matcher.group("ss"));
  }

The correct results can be obtained by replacing text with four mailboxes, and those that do not exist (hh, mimi and ss may not exist) are null.

The running results are as follows:

username:myname--lethean
domain:buaa.edu.cn
yyyy:2020
mm:12
dd:20
hh:15
mimi:01
ss:20

If you enter in format 3, when text is Lethean@buaa.edu.cn-2020-12-02-15:01, the output is as follows:

username:Lethean
domain:buaa.edu.cn
yyyy:2020
mm:12
dd:02
hh:15
mimi:01
ss:null

ss is empty because it is not in this format.

Example modification

Describe

The mail message input format is changed to: (ss: mimi: hh-) dd-mm-yyyy-username @ domain-place

Depending on the time accuracy, there may be the following four correct mail message formats:

dd-mm-yyyy-username@domain-place

Example: 02-12-2020-abc@buaa.edu.cn-Wuhu

hh-dd-mm-yyyy-username@domain-place

Example: 03-02-12-2020-abc@buaa.edu.cn-wuhu

mimi:hh-dd-mm-yyyy-username@domain-place

Example: 00: 03-02-12-2020-abc@buaa.edu.cn-Wuhu

ss:mimi:hh-dd-mm-yyyy-username@domain-place

Example: 01: 00: 03-02-12-2020-abc @ buaa.edu.cn-wuhu

place is a newly added field, which represents a place. It is composed of English letters and is sensitive to case, that is, Beijing and beijing are regarded as different places

Writing style


String pattern = "(((?<ss>\\d{2}):)?((?<mimi>\\d{2}):))?((?<hh>\\d{2})-)?(?<dd>\\d{2})-(?<mm>\\d{2})-(?<yyyy>\\d{4})-(?<username>[A-Za-z-]+)@(?<domain>[A-Za-z0-9.]+)-(?<place>[A-Za-z]+)";
String text = "01:11:03-02-12-2020-abc@buaa.edu.cn-wuhu";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(text);
if (matcher.find()) {
 System.out.println("username:"+matcher.group("username"));
 System.out.println("domain:"+matcher.group("domain"));
 System.out.println("yyyy:"+matcher.group("yyyy"));
 System.out.println("mm:"+matcher.group("mm"));
 System.out.println("dd:"+matcher.group("dd"));
 System.out.println("hh:"+matcher.group("hh"));
 System.out.println("mimi:"+matcher.group("mimi"));
 System.out.println("ss:"+matcher.group("ss"));
 System.out.println("place:"+matcher.group("place"));
}

Pay attention to the front ((? < ss > \\d{2}):)?((? < mimi > \\ d {2}):))? Must be nested, or the error of matching mimi to ss will occur when matching (because the matching format is the same, ss is matched first, but this will not happen after nesting).

The running results are as follows:

username:abc
domain:buaa.edu.cn
yyyy:2020
mm:12
dd:02
hh:03
mimi:11
ss:01
place:wuhu

If you change text to 11: 03-02-12-2020-abc @ buaa. edu. cn-wuhu, the output is as follows, and ss does not match.

username:abc
domain:buaa.edu.cn
yyyy:2020
mm:12
dd:02
hh:03
mimi:11
ss:null
place:wuhu

Summarize


Related articles: