Introduction to the Python re module

  • 2020-04-02 14:27:43
  • OfStack

Escape characters in Python

Regular expressions using the backslash, "\" to represent the particular form or used as the escape character, here with the Python syntax conflict, therefore, Python with "\ \ \ \" said of the regular expression "\", because the regular expression to match "\", need to escape, the \ "\ \", and also need to string in Python syntax every \ escape, so becomes a "\ \ \ \".
To make regular expressions more readable, Python has designed raw strings. It is important to note that you should not use raw strings when writing file paths. Raw string is prefixed with 'r', such as 'r' \n ': for two characters' \' and 'n', instead of a newline character. This form is recommended for writing regular expressions in Python.

Regular expression metacharacter description


.   Matches any character other than a newline character 
^   Matches the beginning of the string 
$   Matches the end of the string 
[]  Matches a specified character category 
?   The character repeats for the previous character 0 Time to 1 time 
*   Repeat for the previous character 0 Times to infinity 
{}  Repeat for the previous character m time 
{m,n}  Repeat for the previous character m to n time 
d  Match the number, equal to [0-9]
D  Matches any non-numeric character, equivalent to [^0-9]
s  Matches an arbitrary blank character, equivalent to [ fv]
S  Matches any non-whitespace character, equivalent to [^ fv]
w  Matches any alphanumeric character, equivalent to [a-zA-Z0-9_]
W  Matches any non-alphanumeric character, equivalent to [^a-zA-Z0-9_]
b  Matches the beginning or end of a word 

Module function description is an example
Re.com pile compiles the regular expression into a pattern object


compile(pattern, flags=0)

First argument: rule
Second parameter: flag bit

Re. Match matches only the beginning of the string. If the string does not match the regular expression at the beginning, the match fails and the function returns None


match(pattern, string, flags=0)

First argument: rule
Second argument: represents the string to match
The third parameter, the Peugeot bit, controls how the regular expression is matched

Re. Search matches the entire string until a match is found


search(pattern, string, flags=0)

First argument: rule
Second argument: represents the string to match
The third parameter, the Peugeot bit, controls how the regular expression is matched


>>> import re
>>> pattern = re.compile(r'linuxeye')
>>> match = pattern.match('jb51.net')
>>> print match
<_sre.SRE_Match object at 0x7f4e96e61c60>
>>> print match.group()
linuxeye
>>> m = pattern.match('blog.jb51.net') #match Match the beginning, not found 
>>> print m
None
>>> m = pattern.search('blog.jb51.net') #search Matches the entire string until a match is found 
>>> print m
<_sre.SRE_Match object at 0x7f15abfc6b28>
>>> print m.group()
linuxeye

>>> m = re.match(r'linuxeye','jb51.net') # Don't have to re.compile
>>> print m
<_sre.SRE_Match object at 0x7f4e96e61b90>
>>> print m.group()
linuxeye
>>> m = re.match(r'linuxeye','www.jb51.net')
>>> print m
None

Re. Split is used to split a string


split(pattern, string, maxsplit=0)

First argument: rule
Second argument: string
The third parameter: the maximum split string, which defaults to 0, means that each match is split
Instance: split all strings


>>> import re
>>> test_str = "1 2 3 4 5"
>>> re.split(r's+',test_str)
['1', '2', '3', '4', '5']
>>> re.split(r's+',test_str,2) # Before the split 2 a 
['1', '2', '3 4 5']
 
>>> test_str = "1 . 2. 3 .4 . 5"
>>> re.split(r'.',test_str)
['1 ', ' 2', ' 3 ', '4 ', ' 5']
>>> re.split(r'.',test_str,3)
['1 ', ' 2', ' 3 ', '4 . 5']

Re. Findall looks for a string that matches the rule in the target string


findall(pattern, string, flags=0)

First argument: rule
Second parameter: the target string
But three parameters: you can also follow a rule selection
The result is a list. The string that matches the rule is stored in the list. If no string matches the rule is found, a null value will be returned


>>> import re
>>> test_mail = '<test01@gmail.com> <test02@gmail.org> test03@gmail.net'
>>> mail_re = re.compile(r'w+@g.....[a-z]{3}')
>>> re.findall(mail_re,test_mail)
['test01@gmail.com', 'test02@gmail.org', 'test03@gmail.net']

Re. Sub works with regular expression based substitution


sub(pattern, repl, string, count=0)

First argument: rule
Second argument: the replaced string
Third argument: string
Fourth parameter: the number of substitutions. The default is 0, meaning that each match is replaced


>>> test = 'blog.jb51.net jb51.net'
>>> test_re = re.compile(r'.')
>>> re.sub(test_re,'--',test)
'blog--linuxeye--com linuxeye--com'
>>> re.sub(test_re,'--',test,1)
'blog--jb51.net jb51.net'

Related articles: