Implementation of Python re. sub Reverse Reference

  • 2021-11-14 06:11:25
  • OfStack

Directory match grouping
re. sub Matching and Replacement
Reverse reference
Reference

re module is a module provided by Python standard library for dealing with regular expressions. Using re module can conveniently use regular expressions to achieve string matching, replacement and other operations

match Packet

The match function is provided in the Python re module to match regular expression rules specified in strings. For example, if you want to match Isaac Newton in "Isaac Newton, physicist", you can use the regular expression\ w +\ w + and run as follows:


>>> m = re.match("\w+ \w+", "Isaac Newton, physicist")
>>> m
<re.Match object; span=(0, 12), match='Isaac Newton'>

The first parameter of re. match is the specified regular expression rule, and the second parameter is the string to be matched. Regular expression rule\ w + means that 1 consecutive character is matched, and the number of characters matched is required to be greater than 1. \ w +\ w + means that two consecutive characters are matched, and the two characters are separated by a space

The matching results are grouped when using match, and the grouped results can be viewed through the group () interface where match returns the results


>>> m.group(0)
'Isaac Newton'

By default, match generates only one packet, the 0th packet, representing the entire match. For the above example, grouping 0 is the complete match to\ w +\ w +, which is Issac Newton.
You can manually specify matching groupings by using parentheses () in regular expressions. For example, if you want to group Issac and Newton as two groups, you can change the regular expression to (\ w +) (\ w +):


>>> m = re.match("(\w+) (\w+)", "Isaac Newton, physicist")

In the regular expression (\ w +) (\ w +), two groups are specified that match\ w + and are separated by spaces.
Use groups () to view all the groupings in the match results:


>>> m.groups()
('Isaac', 'Newton')

You can also use the group () interface to view each packet separately, where group (0) still represents the complete match result, group (1) represents the first packet in the match result, group (2) represents the second packet, and so on:


>>> m.group(0)
'Isaac Newton'
>>> m.group(1)
'Isaac'
>>> m.group(2)
'Newton'

re. sub Matching and Replacement

The re. match () function provides a matching interface for regular expressions. re. sub () can not only match regular expressions, but also replace the results in strings to generate a new string.
For example, to replace the result of the match (\ w +) (\ w +) in the string with Albert Einstein, you can write this:


>>> re.sub("(\w+) (\w+)", "Albert Einstein", "Isaac Newton, physicist")
'Albert Einstein, physicist'

The first parameter in re. sub represents the matching regular expression, the second parameter represents the replacement expression, and the third parameter represents the original string

The replacement expression here is a manually specified new string Albert Einstein that has nothing to do with the contents of the original string. If you want to reuse the contents of the original string, you need to use the back reference function of re. sub.

Reverse reference

Reverse reference means that in the process of specifying the replacement result, the matching content in the original string can be referenced. For example, (\ w +) (\ w +) matches Isaac Newton in the original string, and with the matching result, the result is overwritten as FirstName: Isaac, LastName: Newton.
Since a reference is required, there must be an expression that can represent the matching content. It happens that the matching result of re. sub also has the same grouping as re. match 1, so it is only necessary to refer to the grouping result in the replacement expression. There are several ways to reference:

\ number: For example,\ 1 represents the first grouping in the match result, which is the Isaac part of the example. \g < number > : For example\ g < 1 > , and\ number denote 1 sample and also represent the first grouping in the match result. Compared with the\ number notation,\ g < number > Avoid ambiguity. Imagine replacing the Isaac that the first packet matches with Isaac0 with\ number, then you need to use\ 10, which means adding 0 after the first packet, but the program will recognize it as the tenth packet. And use\ g < number > Just write\ g < 1 > 0 will do.

Going back to the starting example, rewrite the matching result Isaac Newton to FirstName: Isaac, LastName: Newton, which can be implemented by the following expression:


>>> re.sub("(\w+) (\w+)", "FirstName: \g<1>, LastName: \g<2>", "Isaac Newton, physicist")
'FirstName: Isaac, LastName: Newton, physicist'

Reference

re-Regular expression operations
Python re(gex)? -- Groupings and backreferences


Related articles: