Regular expression notation is commonly used in Python

2020-04-02 13:55:14
OfStack

The understanding of regular expressions in Python is mainly the understanding of symbols. In this paper, the commonly used regular expression symbols in Python are briefly analyzed. Its main symbols are:

.
The default matches a character without a newline, and matches a newline if DOTALL is set

^
Match the beginning of a line

$
Match the end of each line

*
Matches 0 or more replicates

+
Matches one or more replicates

?
Match one or zero replicates

* & # 63; , + & # 63; , & # 63; The & # 63;
Match according to the non-greedy pattern

{m}, {m, n}, {m, n} the & # 63;
Match m repeats, m to n repeats, and m to n repeats according to the non-greedy pattern

\
escape

[]
[ABC], [a-z] [^ a-z]
|
Or match 'a|b'
(...).
Matching group


(?iLmsux)

(?:...) (?P<name>...)
>>> re.match('(?P<name>abc){2}','abcabc').groupdict()
{'name': 'abc'}
(?P=name)
>>> re.match(r'(?P<name>abc)----(?P=name)','abc----abc').group()
'abc----abc'
(?#...) # The rest is a comment 
(?=...)

What follows the matched string needs to be matched


>>> re.match(r'phone(?=d{3})','phone123').group()
'phone' # 

(?!...)

The content after the matched character cannot be matched


>>> re.match(r'phone(?!d{3})','phoneabc123').group()
'phone'
(?<=...)

The matched string needs to be matched in front of it

(& # 63; < ! ...). Matched characters cannot be matched in front of them

(& # 63; Yes - the pattern (id/name) | no - the pattern)
\ number
\A matches the beginning of A string
\b matches word boundaries

\ B
Antisense \ b

\ [0-9] d said
\ [^ 0-9] D said
\s for [\t\r\n\f\v]
\S is a non - white space character
\w is equivalent to [a-za-z0-9]
The antisense of \W \W

\Z matches the end of the string