Python regular expression grouping concept and usage detail

  • 2020-06-07 04:45:24
  • OfStack

This article illustrates the concept and usage of Python regular expression grouping. To share for your reference, specific as follows:

Regular expression grouping

A grouping is a regular expression surrounded by a pair of parentheses called "()", and the matched content represents a grouping. Starting from the left side of the regular expression, you see the first left bracket "(" for the first group, the second for the second group, and so on. Note that there is an implicit global group (that is, 0), which is the whole regular expression.

After grouping, to get the content of a group, simply use the group(num) and groups() functions to extract it directly.

For example: Extract text from hyperlinks in code


>>> s='<div><a href="https://support.google.com/chrome/?p=ui_hotword_search" rel="external nofollow" target="_blank"> More and more </a><p>dfsl</p></div>'
>>> print re.search(r'<a.*>(.*)</a>',s).group(1)
 More and more 

or


>>> print re.match(r'.*<a.*>(.*)</a>',s).group(1)
 More and more 

According to the above group after the match, we can get what we want to get a string, but if we regular expressions in parentheses is more, that we get what we want in the string, want to go to each number we want string which parenthesis, this will be very trouble, this time Python introduced another 1 kind of grouping, that is named after the group, the name is unknown group.

After grouping

To name a group is to give an additional individual name to a group that has a default group number. The syntax for named groups is as follows:


(?P<name> Regular expression )#name is 1 A valid identifier 

For example: extract the ip address from the string


>>> s = "ip='230.192.168.78',version='1.0.0'"
>>> re.search(r"ip='(?P<ip>\d+\.\d+\.\d+\.\d+).*", s)
>>> res.group('ip')# Groups are referenced by named groups 
'230.192.168.78'

After to the reference

In a regular expression, the representation in the parentheses "()" is a group. You can then use 1 regular operation for the whole group, such as the repeat operator.
Note that only the parenthesis "()" can be used to form a group. "For defining character sets." {} "is used to define a repeat operation.
When a regular expression group is defined with "()", the regular engine Numbers the matched groups sequentially and stores them in the cache. This way we can use '\ Numbers' or named groupings when we want to refer back to something that has already been matched (? P=name) "to quote. \1 refers to the first group,\2 refers to the second group, and so on,\n refers to the n group. \0 refers to the entire matched regular expression itself. These references must be valid within a regular expression to match 1 repeated string.
Such as:


# Backward reference by named grouping 
>>> re.search(r'(?P<name>go)\s+(?P=name)\s+(?P=name)', 'go go go').group('name')
'go'
# Reference backward by default group number 
>>> re.search(r'(go)\s+\1\s+\1', 'go go go').group()
'go go go'

Swap the positions of strings


>>> s = 'abc.xyz'
>>> re.sub(r'(.*)\.(.*)', r'\2.\1', s)
'xyz.abc'

Forward positive assertion, backward positive assertion

The syntax for forward affirmative assertions:

(?=pattern)

Syntax for backward affirmative assertions:

(?<=pattern)

It is important to note that if both forward and backward affirmative assertions are needed in the matching process, the backward affirmative assertion must be written before the regular statement, and the forward affirmative assertion after the regular statement, indicating after the backward affirmative pattern and before the forward affirmative pattern.
For example: get the comment content in c language code


>>> s1='''char *a="hello world"; char b='c'; /* this is comment */ int c=1; /* t
his is multiline comment */'''
>>> re.findall( r'(?<=/\*).+?(?=\*/)' , s1 ,re.M|re.S)
[' this is comment ', ' this is multiline comment ']

(? < =/*) this is a backward affirmative assertion, after "/*". (& # 63; =*/), which is the forward affirmative assertion. Before "*/", the two are combined to form an interval, so the backward positive assertion comes before the forward positive assertion.

Forward negative assertion, backward negative assertion

Forward negative assertion syntax:

(?!pattern)

Backward negative assertion syntax:

(?<!pattern)

Examples of forward and backward negation:


# Extraction is not .txt Closing file 
>>> f1 = 'aaa.txt'
>>> re.findall(r'.*\..*$(?<!txt$)',f1)
[]
# Extract files that do not begin with a number 
>>> re.findall(r'^(?!\d+).*','1txt.txt')
[]
# Extract does not begin with a number py Closing file 
>>> re.findall(r'^(?!\d+).+?\..*$(?<!py$)','test.py')
[]
>>> re.findall(r'^(?!\d+).+?\..*$(?<!py$)','test.txt')
['test.txt']

PS: Here are two more handy regular expression tools for your reference:

JavaScript Regular Expression online test tool:
http://tools.ofstack.com/regex/javascript

Regular expression online generation tool:
http://tools.ofstack.com/regex/create_reg

For more information about Python, please visit Python Regular expression Usage Summary, Python Data Structure and Algorithm Tutorial, Python Function Usage Summary, Python String Manipulation Skills Summary, Python Introduction and Advanced Classic Tutorial and Python File and Directory Manipulation Skills Summary.

I hope this article has been helpful in Python programming.


Related articles: