Detailed Explanation of Grouping in javascript Regular Expressions

  • 2021-07-04 17:45:04
  • OfStack

I wrote an article about regular beginners before. I thought I knew regular expressions relatively well, but today I encountered a pit, which may be the reason why I was not careful enough. Today, I will focus on sharing with you the grouping in javascript regular expressions. If you don't understand JS regular expressions, you can click here to learn more.

Grouping is still widely used in regularity. What I understand as grouping is a pair of parentheses (), and every pair of parentheses represents a grouping.

Groups can be divided into:

Capture grouping Non-capture grouping Capture grouping

Capture grouping will get the corresponding grouping results in terms 2 and 3 in functions such as match exec. Let's look at an example first


 var reg = /test(\d+)/;
 var str = 'new test001 test002';
 console.log(str.match(reg));
//["test001", "001", index: 4, input: "new test001 test002"]

In the code (\ d +) is a grouping (some people also call it a subschema), but it all means the same thing. In the above example, test001 is the result of an exact match.
The grouping match, however, is to look for the character that matches the sub-pattern\ d + from the entire exact match result (i.e. test001), which is obviously 001 in this case.

But what happened today is this


 var reg = /test(\d)+/;
 var str = 'new test001 test002';
 console.log(str.match(reg));
//["test001", "1", index: 4, input: "new test001 test002"]

The difference is that (\ d +) is changed to (\ d) +, and the whole match is still test001, but the first group matches differently.

Let's take our time to analyze their differences

(\ d +) This is a case of one grouping, since by default the matching pattern is greedy, that is to say, as many matches as possible
All\ d + matches to 001 and then a pair of parentheses is added, that is, a grouping, so that the matching result in the first grouping is 001.

Let's look at (\ d) + Again, this is also a greedy pattern. First, it matches 0, then 0, then 1, and then 1. Again, it matches this match.

It looks no different from the match in the first example, but the grouping here (\ d) means matching a single number.

According to my previous understanding, it will match the initial matching result, which is 0, but this understanding is wrong. Because the whole match is greedy, try to match as much as possible

The (\ d) in the packet captures the result of the last match 1

If it is not greedy, it will match as little as possible


 var reg = /test(\d)+?/;
 var str = 'new test001 test002';
 console.log(str.match(reg));

//["test001", "0", index: 4, input: "new test001 test002"]

So (\ d) the match result is 0. Although there are still matches, there are as few matches as possible here

Non-capture grouping


 var reg = /test(?:\d)+/;
 var str = 'new test001 test002';
 console.log(str.match(reg));
//["test001", index: 4, input: "new test001 test002"]

Non-captive grouping means that you need a pair of parentheses in some places, but you don't want it to be a captive grouping, that is, you don't want this grouping to be captured by functions like macth exec

Usually before the inside of brackets? : That is (? : pattern) This becomes a non-capturing packet,

In this way, the result of match will not appear the content matched by grouping, that is, the 1 of item 2 is missing.

This article highlights the difference between (\ d +) and (\ d) +, which is also the pit I stepped on today. If there are any mistakes, please correct them.


Related articles: