Python re module of Regular Expression
- 2020-04-02 13:49:57
- OfStack
The role of the module is mainly used for string and text processing, find, search, replace and so on
Review the basic regular expressions
. : matches any single character except the newline character
* : matches any character, one, zero, many can match to get the commonly known as the greedy pattern
+ : matches one or more characters before +
| : matches characters before or after |
^ : matches the beginning of a line
$: matches the end of the line
The & # 63; : match located? Previous zero or one character, does not match more than one character
\ : represents the escape character after \
[] : matches any single character in []. [0-9] matches any number from 0 to 9
() : treat the contents within () as a whole
{} : match according to the number of times in {}. 100[0-9]{3} means that any 3-digit number (100-999) is matched after 100.
Metacharacters starting with \ in python:
Special sequence symbol |
meaning |
\A |
Matches only at the beginning of a string |
\Z |
Matches only at the end of the string |
\b |
Matches an empty string at the beginning or end |
\B |
Matches an empty string that is not at the beginning or end |
\d |
The equivalent of [0-9] |
\D |
The equivalent of [^0-9] |
\s |
Matches any whitespace character :[\t\n\r\r\v] |
\S |
Matches any non - white space character :[^\t\n\r\r\v] |
\w |
Matches arbitrary Numbers and letters :[a-zA-Z0-9] |
\W |
Matches any non-number and letter :[^a-zA-Z0-9] |
Regular expression syntax table
grammar | meaning | instructions |
"." | Any character | |
"^" | Beginning of string | '^hello' matching 'helloworld' And don't match 'aaaahellobbb' |
"$" | End of string | With the same |
"*" |
0 One or more characters (greed matching) |
<*> matching <title>chinaunix</title> |
"+" |
1 One or more characters (greed matching ) |
With the same |
"?" |
0 One or more characters (greed matching ) |
With the same |
*?,+?,?? |
The above three take the first match result (non-greedy match) ) | <*> matching <title> |
{m,n} |
Repeat for the previous character m to n Time, {m} Can also be |
a{6} matching 6 a a , a{2,4} matching 2 to 4 a a |
{m,n}? |
Repeat for the previous character m to n And take as little as possible |
' aaaaaa' In the a{2,4} Will only match 2 a |
"\\" |
Special character escape or special sequence | |
[] |
Represents a character set | [0-9] , [a-z] , [A-Z] , [^0] |
"|" |
or | A|B, Or operation |
(...) |
Matches any expression in parentheses | |
(?#...) |
Comments that can be ignored | |
(?=...) |
Matches if ... matches next, but doesn't consume the string. |
'(?=test)' in hellotest In the match hello |
(?!...) |
Matches if ... doesn't match next. |
'(?!=test)' if hello Not for behind test Matching, hello |
(?<=...) |
Matches if preceded by ... (must be fixed length). |
'(?<=hello)test' in hellotest In the match test |
(?<!...) |
Matches if not preceded by ... (must be fixed length). |
'(?<!hello)test' in hellotest Do not match test |
Matching flags and meanings
mark | meaning |
re.I | Ignore case |
re.L | Change according to local Settings \w,\W,\b,\B,\s,\S Match content of |
re.M | Multi-line matching pattern |
re.S | Make" . "Metacharacter matches newline characters |
re.U | matching Unicode character |
re.X | Ignores the whitespace in the pattern that needs to be matched and can be used "#" No comments |
Text content (extract password file under Linux)
man:x:6:12:man:/var/cache/man:/bin/nologin
The re module has three search functions, each of which takes three parameters (matching pattern, string to match, flag to match), returns an object instance if it matches, and returns None if it doesn't.
Findall (): find strings in strings that match regular expressions and return a list of those strings
Search (): searches the entire string and returns an instance of the object
Match (): matches only from the first character, the latter no longer match, returns the object instance
lovelinux@LoveLinux:~/py/boke$ cat text
man:x:6:12:man:/var/cache/man:/bin/sh
lovelinux@LoveLinux:~/py/boke$ cat test.py
#/usr/bin/env python
#coding:utf-8
import re
with open('text','r') as txt:
f = txt.read()
print re.match('bin',f)
print re.search('bin',f).end()
lovelinux@LoveLinux:~/py/boke$ python test.py
None
34
lovelinux@LoveLinux:~/py/boke$ vim test.py
lovelinux@LoveLinux:~/py/boke$ python test.py
None
<_sre.SRE_Match object at 0x7f12fc9f9ed0>
Return is an object instance and there are two methods,
Start () : returns the beginning index of the record matching to the character
End () : returns the end index of the record matching to the character
lovelinux@LoveLinux:~/py/boke$ python test.py
None
31
34
lovelinux@LoveLinux:~/py/boke$ cat test.py
#/usr/bin/env python
#coding:utf-8
import re
with open('text','r') as txt:
f = txt.read()
print re.match('bin',f)
print re.search('bin',f).start()
print re.search('bin',f).end()