A common method for regular expression re modules in python

  • 2020-04-02 09:49:42
  • OfStack

1. Introduction of re
The re module in python, while not sufficient for all complex matching situations, is sufficient to effectively analyze complex strings and extract relevant information in most cases. Python converts regular expressions into bytecode and USES the C language's matching engine for depth-first matching.

 
import re 
print re.__doc__ 


You can query the function information of re module, and the following examples will be combined to illustrate.

2. Regular expression syntax for re

The regular expression syntax table is as follows:

grammar meaning instructions "." Any character
"^" Beginning of string '^ hello' matching 'helloworld' instead of 'aaaahellobbb' "$" End of string With the same "*"  
0 or more characters (greedy match)
< * > matching < The title > chinaunix < / title > "+"
1 or more characters (greedy matching)
With the same
"?"
0 or more characters (greedy match)
With the same
*? , +? And??????
The above three take the first match result (non-greedy match) < * > matching < The title >
{m, n}
{m} is acceptable if the previous character is repeated m to n times
A {6} matches 6 as, and a{2,4} matches 2 to 4 as N}, {m?
Repeat m to n times for the previous character, taking as few as possible
A {2,4} in 'aaaaaa' only matches 2 "\ \"
Special character escape or special sequence
[]
Represents a character set [0-9], [a-z], [a-z], [^0] "|"
or A | B, or operation (...).
Matches any expression in parentheses
(? #...).
Comments that can be ignored
(? =)...
Matches the if... Matches next, but doesn't consume the string.
'(? = test) '   Match hello in hellotest (? ! ...).
Matches the if... Doesn 't match next.
'(? ! = test) '   If hello is not followed by test, match hello
(? < =)...  
Matches if preceded by... (must be fixed length).
'(? < = hello) test '   Match the test in hellotest
(? < ! ...).
Matches if the not preceded by... (must be fixed length).
'(? < ! Hello) test '   Does not match test in hellotest

The list of special regular expression sequences is as follows:

Special sequence symbol
meaning \ a.
Matches only at the beginning of a string / Z
Matches only at the end of the string \ b
Matches an empty string at the beginning or end \ B
Matches an empty string that is not at the beginning or end \ d
Equivalent to [0-9] \ D
Equivalent to [^ 0-9] \ s
Match any white space character :[\t\n\r\r\v] \ S
Match any non-white space character :[^\t\n\r\r\v] \ w
Match arbitrary Numbers and letters :[a-za-z0-9] \ W
Matches any non-number and letter :[^ a-za-z0-9]

3. Main functions of re

      Common function functions include: compile, search, match, split, findall (finditer), sub (subn)
The compile
Re.com from running (pattern, flags [])
Effect: converts the regular expression syntax into a regular expression object
The definition of flags includes:
Re.I: ignore case
Re.L: special character set \w, \w, \b, \b, \s, \s depending on the current environment
Re.M: multi-line mode
Re.S: '. 'and any character including a newline (note:'. 'does not include a newline)
Re.U: represents special character set \w, \w, \b, \b, \d, \d, \s, \s depending on Unicode character properties database

The search
Re. The search (pattern, a string, flags [])
Search (string [, pos, endpos []])
Action: finds a location in the string that matches the regular expression pattern, returns an instance of MatchObject, or None if no match is found.

The match
Re. The match (the pattern, a string, flags [])
Match (string [, pos, endpos []])
Action: the match() function only tries to match the regular expression at the beginning of the string, that is, only reports a match from position 0, while the search() function scans the entire string for a match. If you want to search the entire string for a match, you should use search().

Here are some examples:
Example: the most basic usage, called by the re.RegexObject object


#!/usr/bin/env python
import re
r1 = re.compile(r'world')
if r1.match('helloworld'):
    print 'match succeeds'
else:
    print 'match fails'
if r1.search('helloworld'):
    print 'search succeeds'
else:
    print 'search fails' 

R is for raw. Because there are some escape characters in the representation string, such as carriage return '\n'. If you want to express the \ table you need to write '\\'. But if I just need to express a '\'+'n', instead of r, I'm going to say '\' n'. But it's much clearer to use r with r'\n'.

Example: set a flag


#r2 = re.compile(r'n$', re.S)
#r2 = re.compile('n$', re.S)
r2 = re.compile('World$', re.I)
if r2.search('helloworldn'):
    print 'search succeeds'
else:
    print 'search fails' 

Example: direct call

if re.search(r'abc','helloaaabcdworldn'):
    print 'search succeeds'
else:
    print 'search fails' 

The split
Re. The split (pattern, string [, maxsplit = 0, flags = 0])
The split (string [, maxsplit = 0])
Action: you can cut the part of a string that matches a regular expression and return a list
Example: a simple analysis of IP


#!/usr/bin/env python
import re
r1 = re.compile('W+')
print r1.split('192.168.1.1')
print re.split('(W+)', '192.168.1.1')
print re.split('(W+)', '192.168.1.1', 1) 

The results are as follows:
[' 192 ', '168', '1', '1']
['192', '.', '168', '.', '1', '.', '1']
[' 192 ', '. ', '168.1.1]

The.findall
Re. The.findall (pattern, string, flags [])
The.findall (string [, pos, endpos []])
Action: find all substrings matched by the regular expression in the string and return them as a list
Example: find what [] includes (greedy and non-greedy lookups)


#!/usr/bin/env python
import re
r1 = re.compile('([.*])')
print re.findall(r1, "hello[hi]heldfsdsf[iwonder]lo")
r1 = re.compile('([.*?])')
print re.findall(r1, "hello[hi]heldfsdsf[iwonder]lo")
print re.findall('[0-9]{2}',"fdskfj1323jfkdj")
print re.findall('([0-9][a-z])',"fdskfj1323jfkdj")
print re.findall('(?=www)',"afdsfwwwfkdjfsdfsdwww")
print re.findall('(?<=www)',"afdsfwwwfkdjfsdfsdwww") 

finditer
Re. Finditer (pattern, string, flags [])
Finditer (string [, pos, endpos []])
Note: similar to findall, all substrings matched by the regular expression are found in the string and returned as an iterator. The same RegexObject has:

sub
Re. Sub (pattern, repl, string[, count, flags])
Sub (repl, string, count = [0])
Note: find all substrings matching the regular expression pattern in the string string and replace them with another string, repl. If no string matching the pattern is found, an unmodified string is returned. Repl can be either a string or a function.
Ex. :


#!/usr/bin/env python
import re
p = re.compile('(one|two|three)')
print p.sub('num', 'one word two words three words apple', 2) 

subn
Re. Subn (pattern, repl, string[, count, flags])
Subn (repl, string, count = [0])

Note: this function does the same thing as sub(), but it also returns a new string and the number of substitutions. The same RegexObject has:


Related articles: