Deep understanding of JS regular expressions grouping

  • 2021-07-04 17:43:30
  • OfStack

Deep understanding of JS regular expressions-grouping

I wrote an article about regular beginners before. I thought I knew regular expressions relatively well, but today I encountered a pit, which may be the reason why I was not careful enough. Today, I will focus on sharing with you the grouping in javascript regular expressions. If you don't understand JS regular expressions, you can click here to learn more.

Grouping is widely used in regularity. The grouping I understand is a pair of brackets (), and each pair of brackets represents a grouping. Grouping can be divided into:

Acquisition grouping
Non-capture grouping

Capture grouping

Capture grouping will get the corresponding grouping results in terms 2 and 3 in functions such as match exec. Let's look at an example first


var reg = /test(\d+)/;
 var str = 'new test001 test002';
 console.log(str.match(reg));//["test001", "001", index: 4, input: "new test001 test002"]

In the code (\ d +) is a grouping (some people also call it a subpattern), but they all mean the same thing. In the above example, test001 is the result of an exact match. However, the grouping match is to look for the characters that match the subpattern\ d + from the whole exact match result (that is, test001), which is obviously 001. But today's situation is like this


var reg = /test(\d)+/;
 var str = 'new test001 test002';
 console.log(str.match(reg));//["test001", "1", index: 4, input: "new test001 test002"]

The difference is that (\ d +) is changed to (\ d) +, and the whole match is still test001, but the first group matches differently. Let's take our time to analyze their differences

(\ d +) This whole case is one group, By default, the matching pattern is greedy, that is to say, match as much as possible.\ d + matches the result of 001 and then adds a pair of parentheses, that is, a grouping. So the result of the match in the first grouping is 001. Look at the (\ d) + in the second example. Again, this is also a greedy pattern. First, it will match 0 first, then 0 will match and finally 1 will match. At the end of this match, it will look no different from the match in the first example. But the grouping here (\ d) means matching a single number, which is 0 according to my previous understanding, but this understanding is wrong. Since the whole match is greedy, as many matches as possible in the packet (\ d) will capture the result of the last match 1, and if it is non-greedy, as few matches as possible


 var reg = /test(\d)+?/;
 var str = 'new test001 test002';
 console.log(str.match(reg));//["test001", "0", index: 4, input: "new test001 test002"]

So (\ d) the match result is 0. Although there are still matches, there are as few matches as possible here

Non-capture grouping

Non-captive grouping is where you need a pair of parentheses, but you don't want it to be a captive grouping, that is, you don't want this grouping to be captured by functions like macth exec. Usually, you put it in front of the parentheses. : That is (? : pattern) This becomes a non-capturing packet,


var reg = /test(?:\d)+/;
 var str = 'new test001 test002';
 console.log(str.match(reg));//["test001", index: 4, input: "new test001 test002"]

In this way, the result of match will not appear the content matched by grouping, that is, the 1 of item 2 is missing.

This article highlights the difference between (\ d +) and (\ d) +, which is also the pit I stepped on today. If there are any mistakes, please correct them.


Related articles: