Python regular expressions tutorial 2: capture

  • 2020-05-26 09:33:34
  • OfStack

preface

In the previous article, we introduced you to the basics of Python regular expressions, so in this article, we'll summarize 1 about the use of regular expressions for capture. Without further ado, let's take a look at the details.

capture

Capturing and grouping are closely related in regular expressions. In general, grouping is capturing, and it is done with curly braces (thus, curly braces are also special characters in regular expressions and need to be escaped when expressing the original meaning) :

(...). Normal grouping and capture

(& # 63; :...). Group, but not capture

For example, suppose we need to match a landline number:


>>> m = re.search(r'^(\d{3,4}-)?(\d{7,8})$','020-82228888')
>>> m.group(0)
'020-82228888'
>>> m.group(1)
'020-'
>>> m.group(2)
'82228888'

Here, the default group (0) is a full match, and the subsequent groups are arranged in the order in which they appear.

Next, we want to find all the landline Numbers in the whole text of 1, which we need to use here re.findall :


>>> re.findall(r'(\d{3,4}-)?(\d{7,8})','020-82228888\n0357-4227865') 
[('020-', '82228888'), ('0357-', '4227865')]

One feature of findall is that if there are captured groups in the result, the captured groups are returned as tuple. Take advantage of this feature, and the grouping mentioned above, but without capturing the syntax, to get the results we want:


>>> re.findall(r'(?:\d{3,4}-)?\d{7,8}','020-82228888\n0357-4227865') 
['020-82228888', '0357-4227865']
>>> re.findall(r'(?:\d{3,4}-)?\d{7,8}','020-82228888\n4227865')  
['020-82228888', '4227865']

In regular expressions, you can also use \1,\2, and so on to refer to previously captured string combinations. This is often used for the correct matching of single and double quotation marks:


>>> sentence = """You said "why?" and I say "I don't know"."""
>>> re.findall(r'["\'](.*?)["\']', sentence)
['why?', 'I don']
>>> re.findall(r'(["\'])(.*?)\1', sentence)
[('"', 'why?'), ('"', "I don't know")]

In addition, if you feel that \1,\2 is not readable, you can give it an English name. In the following example, the conversion between two different date formats is implemented:


>>> sentence = "from 12/22/1629 to 11/14/1643"
>>> re.sub(r'(?P<month>\d{2})/(?P<day>\d{2})/(?P<year>\d{4})', r'\g<year>-\g<month>-\g<day>', sentence) 
'from 1629-12-22 to 1643-11-14'

However, this method of named reference capture is not valid in findall, search:


>>> sentence = """You said "why?" and I say "I don't know"."""
>>> re.findall(r'(?P<quote>["\'])(.*?)\g<quote>', sentence)  
[]
>>> re.search(r'(?P<quote>["\'])(.*?)\g<quote>', sentence)   
>>> re.search(r'(?P<quote>["\'])(.*?)\1', sentence)  
<_sre.SRE_Match object; span=(9, 15), match='"why?"'>
>>> re.search(r'(?P<quote>["\'])(.*?)\1', sentence).groupdict()
{'quote': '"'}

conclusion

The above is the whole content of Python regular expression about group capture, I hope the content of this article can bring 1 definite help to everyone's learning or using python, if you have any questions, you can leave a message to communicate, if you have any questions, you can leave a message to communicate. In the next article, I'll continue to summarize the greedy/non-greedy nature of regular expression matching. Stay tuned for this site.


Related articles: