Recursive matching analysis of C regular expressions

  • 2020-10-23 20:16:29
  • OfStack

In C# programming, you often encounter the requirement to match the contents of pairs of curly braces, but the ? in 1 general regular expressions; The syntax of R does not seem to be supported in C#. After a lot of searching and testing, I finally found the following paragraph

/(should be \) (not escaped with/but escaped with \

Matches nested constructs

Microsoft has included an interesting innovation to match stable constructs (which, historically, regular expressions have not). This is not easy to master - although this section is short, note that it is very difficult to follow.
It might be easier to start with an example 1, so I'll start with this code:


Regex r = new Regex(@"/((?>[^()]+|/((?<DEPTH>)|/)(?<-DEPTH>))*(?(DEPTH)(?!))/)"); 

This matches to the first fully matched parenthesis set, such as "before (nope (yes (here) okay) after" (yes (here) okay)". Notice that the first open bracket is not matched because there is no close bracket to match it.

Here's an overview of how it works:

1. When each "(" is matched, "(? < DEPTH > )" (The "/(" at the beginning of the regular expression is not included here).

2. When each ")" is matched, "(? < -DEPTH > )" Subtract 1 from the depth value.

3, "(the & # 63; (DEPTH)(? !). )" Make sure the depth is zero until the last 1 closing bracket is matched.

The reason it works is that the engine's back stack holds the tracks of the groups that match successfully. (& # 63; < DEPTH > )" is nothing more than a grouping construct with a name that will always match (it doesn't match anything). And since it is placed immediately after "/(", its successful match (which remains on the stack until removed) is used for the count of the left parenthesis.

Another way of writing it is "(? < DEPTH > /()", I personally prefer this form to "/(? < DEPTH > ) ". (" /) after the & # 63; < -DEPTH > )" The same.

Thus, the count of the matched group named "DEPTH" is built up on the back stack. When we find the close bracket we also want to subtract 1 from the depth value, which is constructed by the special syntax of.NET "(? < -DEPTH > ), which removes the most recently matched "DEPTH" grouping from the stack. If there are no records on the stack, "(? < -DEPTH > )" Grouping match failed, preventing the regular expression system from matching the extra close bracket.

In the end, "(the & # 63; (DEPTH)(? !). )" is 1 used for "(? !). "If the "DEPTH" grouping has been successful so far. If we're successful when we match here, there's an unpaired left bracket that hasn't been "(? < -DEPTH > Removed) ". In this case, we want to stop matching (we don't want to match an unpaired parenthesis), so we use "(? !). ", which is a zero-width negative-predictor-first assertion, continues to match only if the subexpression does not match to the right of the position.
This is how the nested structure is matched in the regular expression implementation of.NET.

The above content seems difficult to understand, in fact, if it is easy to understand, then you should not understand, you can just use OK, replace () with the character you want, I believe it can solve a lot of your problems,

Here is a test case based on this usage


private void button3_Click( object sender, EventArgs e )
{
    Regex r = new Regex( @"/[(?>[^/[/]]+|/[(?<DEPTH>)|/](?<-DEPTH>))*(?(DEPTH)(?!))/]" );
    StringBuilder sb = new StringBuilder();
    MatchString( "[111[222[333]]][222[333]][333]", r, sb );
    MessageBox.Show( sb.ToString(), " Information retrieved " );
}
private void MatchString( string OutString, Regex r, StringBuilder sb )
{
    MatchCollection ms = r.Matches( OutString );//  Get all the matches 
    foreach ( Match m in ms )
    {
          if ( m.Success )
          {
               sb.AppendLine( m.Groups[0].Value );
               MatchString( m.Groups[0].Value.Substring( 1, m.Groups[0].Value.Length - 1 ), r, sb );//  Remove the matching head and tail  "["  and  "]" , to avoid falling into the endless loop recursion, resulting in overflow 
           }
     }
     return;
}

You can get


[111[222[333]]] [222[333]] [333] [222[333]] [333] [333] 

I believe that this article has a certain reference value for your C# programming.


Related articles: