Python re module of Regular Expression

  • 2020-04-02 13:49:57
  • OfStack

The role of the module is mainly used for string and text processing, find, search, replace and so on

Review the basic regular expressions

  . : matches any single character except the newline character

  * : matches any character, one, zero, many can match to get the commonly known as the greedy pattern

+ : matches one or more characters before +

  | : matches characters before or after |

  ^ : matches the beginning of a line

  $: matches the end of the line

  The & # 63; : match located? Previous zero or one character, does not match more than one character

  \ : represents the escape character after \

  [] : matches any single character in []. [0-9] matches any number from 0 to 9

  () : treat the contents within () as a whole

  {} : match according to the number of times in {}. 100[0-9]{3} means that any 3-digit number (100-999) is matched after 100.

Metacharacters starting with \ in python:

Special sequence symbol
meaning
\A
Matches only at the beginning of a string
\Z
Matches only at the end of the string
\b
Matches an empty string at the beginning or end
\B
Matches an empty string that is not at the beginning or end
\d
The equivalent of [0-9]
\D
The equivalent of [^0-9]
\s
Matches any whitespace character :[\t\n\r\r\v]
\S
Matches any non - white space character :[^\t\n\r\r\v]
\w
Matches arbitrary Numbers and letters :[a-zA-Z0-9]
\W
Matches any non-number and letter :[^a-zA-Z0-9]

Regular expression syntax table

grammar meaning instructions
"." Any character
"^" Beginning of string '^hello' matching 'helloworld' And don't match 'aaaahellobbb'
"$" End of string With the same
"*" 
0 One or more characters (greed matching)
<*> matching <title>chinaunix</title>
"+"
1 One or more characters (greed matching )
With the same
"?"
0 One or more characters (greed matching )
With the same
*?,+?,??
The above three take the first match result (non-greedy match) ) <*> matching <title>
{m,n}
Repeat for the previous character m to n Time, {m} Can also be
a{6} matching 6 a a , a{2,4} matching 2 to 4 a a
{m,n}?
Repeat for the previous character m to n And take as little as possible
' aaaaaa' In the a{2,4} Will only match 2 a
"\\"
Special character escape or special sequence
[]
Represents a character set [0-9] , [a-z] , [A-Z] , [^0]
"|"
or A|B, Or operation
(...)
Matches any expression in parentheses
(?#...)
Comments that can be ignored
(?=...)
Matches if ... matches next, but doesn't consume the string.
'(?=test)'   in hellotest In the match hello
(?!...)
Matches if ... doesn't match next.
'(?!=test)'   if hello Not for behind test Matching, hello
(?<=...) 
Matches if preceded by ... (must be fixed length).
'(?<=hello)test'   in hellotest In the match test
(?<!...)
Matches if not preceded by ... (must be fixed length).
'(?<!hello)test'   in hellotest Do not match test

Matching flags and meanings

mark meaning
re.I Ignore case
re.L Change according to local Settings \w,\W,\b,\B,\s,\S Match content of
re.M Multi-line matching pattern
re.S Make" . "Metacharacter matches newline characters
re.U matching Unicode character
re.X Ignores the whitespace in the pattern that needs to be matched and can be used "#" No comments


Text content (extract password file under Linux)


man:x:6:12:man:/var/cache/man:/bin/nologin

The re module has three search functions, each of which takes three parameters (matching pattern, string to match, flag to match), returns an object instance if it matches, and returns None if it doesn't.

Findall (): find strings in strings that match regular expressions and return a list of those strings

Search (): searches the entire string and returns an instance of the object

Match (): matches only from the first character, the latter no longer match, returns the object instance


lovelinux@LoveLinux:~/py/boke$ cat text 
man:x:6:12:man:/var/cache/man:/bin/sh
lovelinux@LoveLinux:~/py/boke$ cat test.py
#/usr/bin/env python
#coding:utf-8
import re
with open('text','r') as txt:
 f = txt.read()
 print re.match('bin',f)
 print re.search('bin',f).end() 
lovelinux@LoveLinux:~/py/boke$ python test.py 
None
34
lovelinux@LoveLinux:~/py/boke$ vim test.py
lovelinux@LoveLinux:~/py/boke$ python test.py 
None
<_sre.SRE_Match object at 0x7f12fc9f9ed0>

Return is an object instance and there are two methods,

Start () : returns the beginning index of the record matching to the character

End () : returns the end index of the record matching to the character


lovelinux@LoveLinux:~/py/boke$ python test.py 
None
31
34
lovelinux@LoveLinux:~/py/boke$ cat test.py 
#/usr/bin/env python
#coding:utf-8
import re
with open('text','r') as txt:
 f = txt.read()
 print re.match('bin',f)
 print re.search('bin',f).start()
 print re.search('bin',f).end()


Related articles: