Three Matching Patterns of Regular Expressions in C

  • 2021-09-20 21:20:20
  • OfStack

In C #, we generally use the Regex class to represent a regular expression. 1 The regular expression engine supports three matching modes: single-line mode (Singleline), multi-line mode (Multiline), and case-ignoring (IgnoreCase).

1. Single Row Mode (Singleline)
MSDN Definition: Change the meaning of the dot (.) so that it matches every 1 character (instead of every character except\ n).
A typical scenario for using single-line mode is to obtain information from the source code of a web page.

Example:

Using the WebBrowser control, we obtained the following HTML source code from http://www. xxx. com/1. htm, which is stored in the variable str:

< html >
< body >
< div >
Line 1
Line 2
< /div >
< /body >
< /html >

We want to extract the div tag and its contents, and write the following code:


string pattern = @"<div>.*</div>";
Regex regex = new Regex(pattern);
if (regex.IsMatch(str))
  Console.WriteLine(regex.Match(str).Value);
else
  Console.WriteLine("Mismatch!");

//Result: Mismatch!

Error analysis:

It is generally believed that the dot symbol (.) matches any single character, while (. *) matches any number of characters. But in fact, the dot symbol cannot match the newline character. The equivalent expression in Windows is [^\ r\ n].
And we get the source code of HTML from the website, and few of them don't break lines. At this time, the one-line mode comes in handy, which can change the meaning of point symbols. Modify the constructor of the regex instance to declare the use of single-line schema with RegexOptions. Singleline:


string pattern = @"<div>.*</div>";
Regex regex = new Regex(pattern, RegexOptions.Singleline);
if (regex.IsMatch(str))
  Console.WriteLine(regex.Match(str).Value);
else
  Console.WriteLine("Mismatch!");

/*
The results are:
< div >
Line 1
Line 2
< /div >
*/

Embedding modifiers for single-line mode:

We can directly embed a single-line pattern in a regular expression:

(?s) < div > .* < /div >

(? s) modifier, the expression following which is in single-line mode. So please don't put it at the end when using it. In addition, you can use (? -s) Turns off single-line mode.

Note: Embedding mode takes precedence over the RegexOptions setting of the Regex class, so (? s), parses in single-line mode whether RegexOptions. Singleline is used or not.

2. Multiline Mode (Multiline)

MSDN Definition: Change the meaning of ^ and $so that they match at the beginning and end of any 1 line, respectively, not just at the beginning and end of the entire string.

Example:

There is a text file with 1 user name per 1 line. Read the file into the variable str for processing. Its contents are as follows:

2104 Painter
TerryLee
Don't meet each other
Dflying Chen
Rainy

Borrowing the names of predecessors in Blog Park:)

We want to find a user name that starts with English letters, and write the following code:


string pattern = @"^[A-Za-z]+.*";
Regex regex = new Regex(pattern);
if (regex.IsMatch(str))
  Console.WriteLine(regex.Match(str).Value);
else
  Console.WriteLine("Mismatch!");

//Result: Mismatch!

Error analysis:

(^) is the starting anchor of the string, and the first character of str is a Chinese character, so it does not match. We can use the multi-line pattern to change the meaning of (^) so that it matches the start of every 1 line instead of the start of the whole string.

The change code is as follows:


string pattern = @"^[A-Za-z]+.*";
Regex regex = new Regex(pattern, RegexOptions.Multiline);
if (regex.IsMatch(str))
  Console.WriteLine(regex.Match(str).Value);
else
  Console.WriteLine("Mismatch!");

//Result: TerryLee

At the same time, the multi-line pattern changes the meaning of ($) so that it matches the end of every 1 line instead of the end of the whole string.

Unlike (^) and ($), (\ A) and (\ Z) are not affected by multiline patterns and always match the beginning and end of the entire string.

Embedding modifier for multiline mode: (? m) and (? -m)

3. Ignore case (IgnoreCase)

MSDN Definition: Specifies a case-insensitive match.

This pattern is easy to understand, and it thinks that the upper and lower case characters are the same. Let's still illustrate with the above examples.

Example:


string pattern = @"^[a-z]+.*";
Regex regex = new Regex(pattern, RegexOptions.Multiline | RegexOptions.IgnoreCase);
if (regex.IsMatch(str))
  Console.WriteLine(regex.Match(str).Value);
else
  Console.WriteLine("Mismatch!");

//Result: TerryLee

Analysis: Note the regular expression used this time. We don't write uppercase letters, but we match names that start with uppercase letters. This is the effect of ignoring case.

Embedding modifiers that ignore case: (? i) and (? -i)

Summary:

Finally, we use a table to summarize these three patterns

定义 影响的表达式 RegexOptions枚举 嵌入标识符
单行模式 更改点 (.) 的含义,使它与每1个字符匹配(而不是与除 \n 之外的每个字符匹配)。 Singleline (?s)
多行模式 更改 ^ 和 $ 的含义,使它们分别在任意1行的行首和行尾匹配,而不仅仅在整个字符串的开头和结尾匹配。 Multiline (?m)
忽略大小写 指定不区分大小写的匹配。 IgnoreCase (?i)


Related articles: