Python re module of Regular Expression

  • 2020-04-02 13:49:57
  • OfStack

The role of the module is mainly used for string and text processing, find, search, replace and so on

Review the basic regular expressions

  . : matches any single character except the newline character

  * : matches any character, one, zero, many can match to get the commonly known as the greedy pattern

+ : matches one or more characters before +

  | : matches characters before or after |

  ^ : matches the beginning of a line

  $: matches the end of the line

  The & # 63; : match located? Previous zero or one character, does not match more than one character

  \ : represents the escape character after \

  [] : matches any single character in []. [0-9] matches any number from 0 to 9

  () : treat the contents within () as a whole

  {} : match according to the number of times in {}. 100[0-9]{3} means that any 3-digit number (100-999) is matched after 100.

Metacharacters starting with \ in python:

Special sequence symbol
Matches only at the beginning of a string
Matches only at the end of the string
Matches an empty string at the beginning or end
Matches an empty string that is not at the beginning or end
The equivalent of [0-9]
The equivalent of [^0-9]
Matches any whitespace character :[\t\n\r\r\v]
Matches any non - white space character :[^\t\n\r\r\v]
Matches arbitrary Numbers and letters :[a-zA-Z0-9]
Matches any non-number and letter :[^a-zA-Z0-9]

Regular expression syntax table

grammar meaning instructions
"." Any character
"^" Beginning of string '^hello' matching 'helloworld' And don't match 'aaaahellobbb'
"$" End of string With the same
0 One or more characters (greed matching)
<*> matching <title>chinaunix</title>
1 One or more characters (greed matching )
With the same
0 One or more characters (greed matching )
With the same
The above three take the first match result (non-greedy match) ) <*> matching <title>
Repeat for the previous character m to n Time, {m} Can also be
a{6} matching 6 a a , a{2,4} matching 2 to 4 a a
Repeat for the previous character m to n And take as little as possible
' aaaaaa' In the a{2,4} Will only match 2 a
Special character escape or special sequence
Represents a character set [0-9] , [a-z] , [A-Z] , [^0]
or A|B, Or operation
Matches any expression in parentheses
Comments that can be ignored
Matches if ... matches next, but doesn't consume the string.
'(?=test)'   in hellotest In the match hello
Matches if ... doesn't match next.
'(?!=test)'   if hello Not for behind test Matching, hello
Matches if preceded by ... (must be fixed length).
'(?<=hello)test'   in hellotest In the match test
Matches if not preceded by ... (must be fixed length).
'(?<!hello)test'   in hellotest Do not match test

Matching flags and meanings

mark meaning
re.I Ignore case
re.L Change according to local Settings \w,\W,\b,\B,\s,\S Match content of
re.M Multi-line matching pattern
re.S Make" . "Metacharacter matches newline characters
re.U matching Unicode character
re.X Ignores the whitespace in the pattern that needs to be matched and can be used "#" No comments

Text content (extract password file under Linux)


The re module has three search functions, each of which takes three parameters (matching pattern, string to match, flag to match), returns an object instance if it matches, and returns None if it doesn't.

Findall (): find strings in strings that match regular expressions and return a list of those strings

Search (): searches the entire string and returns an instance of the object

Match (): matches only from the first character, the latter no longer match, returns the object instance

lovelinux@LoveLinux:~/py/boke$ cat text 
lovelinux@LoveLinux:~/py/boke$ cat
#/usr/bin/env python
import re
with open('text','r') as txt:
 f =
 print re.match('bin',f)
lovelinux@LoveLinux:~/py/boke$ python 
lovelinux@LoveLinux:~/py/boke$ vim
lovelinux@LoveLinux:~/py/boke$ python 
<_sre.SRE_Match object at 0x7f12fc9f9ed0>

Return is an object instance and there are two methods,

Start () : returns the beginning index of the record matching to the character

End () : returns the end index of the record matching to the character

lovelinux@LoveLinux:~/py/boke$ python 
lovelinux@LoveLinux:~/py/boke$ cat 
#/usr/bin/env python
import re
with open('text','r') as txt:
 f =
 print re.match('bin',f)

Related articles: