Introduction of Regular Expressions in C

  • 2021-08-21 21:09:44
  • OfStack

Comprehensive pattern matching representation of regular expressions can quickly analyze a large number of texts to find specific character patterns; Extract, edit, replace or delete text substrings; Or add the extracted string to the collection to generate a report.
1. Introduction to Regular Expressions

Regular expressions provide powerful, flexible and efficient methods to deal with text. Comprehensive pattern matching representation of regular expressions can quickly analyze a large number of texts to find specific character patterns; Extract, edit, replace or delete text substrings; Or add the extracted string to the collection to generate a report. Regular expressions are an indispensable tool for many applications that deal with strings, such as HTML processing, log file parsing, and HTTP header parsing.

. NET Framework Regular Expressions incorporate the most common features implemented by other regular expressions, Designed to be compatible with Perl 5 regular expressions, the. NET Framework regular expressions also include functionality not provided in other implementations, and the. NET Framework regular expression classes are part of the base class library and can be used with any language or tool for the common language runtime.

2. String search

Regular expression languages consist of two basic character types: literal (normal) text characters and metacharacters. It is metacharacter groups that provide processing power for regular expressions. At present, all text editors have 1 search function. Usually, you can open a dialog box and type the string to be located in 1 text box. If you want to replace it at the same time, you can type a replacement string. For example, Notepad in Windows operating system and document editors in Office series have this function.

This is the simplest way to search. This kind of problem can be easily solved by the String. Replace () method of String class, but what if you need to identify a duplicate in the document?

Write a routine, from a String class to choose repeated words is more complex, at this time the use of language is very suitable.

General expression language is a language that can write search expressions. In this language, you can combine the text to be searched in the document, escape sequence and other characters with specific meanings in one. For example, the sequence b represents the beginning and end of a word (sub-boundary). If you want to represent the word that is being searched, which begins with the character th, you can write a general expression bth (that is, the sequence character boundary is-t-h). If you want to search for all words ending in th, you can write thb (sequence t-h-word boundary). However, a general expression is much more complicated than this. For example, a tool program (facility) that stores part of the text can be found in a search operation.

3. Regular expression class of. NET framework

The following through the introduction of. NET framework of the regular expression class, familiar with 1. NET framework under the use of regular expressions.

3.1 The Regex class represents a read-only regular expression

The Regex class contains various static methods that allow you to use other regular expression classes without explicitly instantiating objects of other classes. The following code example creates an instance of the Regex class and defines a simple regular expression when initializing the object. Note that an additional backslash is used as the escape character, which specifies the backslash in the s matching character class as literal character.


Regex r; //  Declaration 1 A  Regex Variables of class  
r = new Regex("\s2000"); //  Defining an expression  

3.2 The Match class represents the result of a regular expression match operation

The following example uses the Match method of the Regex class to return an object of type Match to find the first match in the input string. This example uses the Match. Success property of the Match class to indicate whether a match has been found.


Regex r = new Regex("abc"); //  Definition 1 A Regex Object instance  
Match m = r.Match("123abc456"); //  Matching in Strings  
if (m.Success) 
{ 
Console.WriteLine("Found match at position " + m.Index); // Enter the position of the matching character  
} 

3.3 The MatchCollection class represents a sequence of non-overlapping matches

The collection is read-only and has no public constructor. An instance of MatchCollection is returned by the Regex. Matches property. Using the Matches method of the Regex class, populate MatchCollection with all matches found in the input string. The following code example demonstrates how to copy a collection into an array of 1 strings (preserving every 1 match) and an array of integers (indicating the location of every 1 match).


MatchCollection mc; 
String[] results = new String[20]; 
int[] matchposition = new int[20]; 
Regex r = new Regex("abc"); // Definition 1 A Regex Object instance  
mc = r.Matches("123abc4abcd"); 
for (int i = 0; i < mc.Count; i++) // Find all matches in the input string  
{ 
results = mc.Value; // Adds a matching string to the string array  
matchposition = mc.Index; // Record the position of matching characters  
} 

3.4 The GroupCollection class represents a collection of captured groups

The collection is read-only and has no public constructor. An instance of GroupCollection is returned in the collection returned by the Match. Groups property. The following console application finds and outputs the number of groups captured by regular expressions.


using System; 
using System.Text.RegularExpressions; 
public class RegexTest 
{ 
public static void RunTest() 
{ 
Regex r = new Regex("(a(b))c"); // Defining Groups  
Match m = r.Match("abdabc"); 
Console.WriteLine("Number of groups found = " + m.Groups.Count); 
} 
public static void Main() 
{ 
RunTest(); 
} 
} 

The example produces the following output:

Number of groups found = 3

3.5 The CaptureCollection class represents a sequence of captured substrings

Capture groups can capture multiple strings in a single match because of qualifiers. The Captures property (an object of the CaptureCollection class) is provided as a member of the Match and group classes to facilitate access to the collection of captured substrings. For example, if a match is captured from the string "abcabcabc" using the regular expression ((a (b)) c) + (where the + qualifier specifies one or more matches), the CaptureCollection of every one matched Group of the substring will contain three members.

The following program uses the regular expression (Abc) + to find one or more matches in the string "XYZAbcAbcAbcXYZAbcAb", illustrating the use of the Captures attribute to return multiple sets of captured substrings.


using System; 
using System.Text.RegularExpressions; 
public class RegexTest 
{ 
public static void RunTest() 
{ 
int counter; 
Match m; 
CaptureCollection cc; 
GroupCollection gc; 
Regex r = new Regex("(Abc)+"); // Find "Abc" 
m = r.Match("XYZAbcAbcAbcXYZAbcAb"); // Set the string to find  
gc = m.Groups; 
// Output the number of lookup groups  
Console.WriteLine("Captured groups = " + gc.Count.ToString()); 
// Loop through each group. 
for (int i=0; i < gc.Count; i++) // Find every 1 Groups  
{ 
cc = gc.Captures; 
counter = cc.Count; 
Console.WriteLine("Captures count = " + counter.ToString()); 
for (int ii = 0; ii < counter; ii++) 
{ 
// Print capture and position. 
Console.WriteLine(cc[ii] + " Starts at character " + 
cc[ii].Index); // Enter the capture location  
} 
} 
} 
public static void Main() { 
RunTest(); 
} 
} 

This example returns the following output:


Captured groups = 2 
Captures count = 1 
AbcAbcAbc Starts at character 3 
Captures count = 3 
Abc Starts at character 3 
Abc Starts at character 6 
Abc Starts at character 9 

3.6 The Capture class contains results captured from a single subexpression

Loop through the Group set, extract the Capture set from every 1 member of the Group, and assign variables posn and length to the character position in the initial string where each 1 string is found, and the length of each 1 string, respectively.


Regex r; 
Match m; 
CaptureCollection cc; 
int posn, length; 
r = new Regex("(abc)*"); 
m = r.Match("bcabcabc"); 
for (int i=0; m.Groups.Value != ""; i++) 
{ 
cc = m.Groups.Captures; 
for (int j = 0; j < cc.Count; j++) 
{ 
posn = cc[j].Index; // Capture object location  
length = cc[j].Length; // Capture object length  
} 
} 

Combining the combined characters will return 1 group object at a time, which may not be the result we want. If you want to have combined characters as part 1 of the search pattern, there will be considerable overhead. For a single group, you can use the character sequence "? : "The first group forbids this, as in the URI sample. For all groups, the RegExOptions. ExplicitCapture flag can be specified on the RegEx. Matches () method.

I hope that through the introduction of regular expressions in this article, it can help you.


Related articles: