Regular expressions for Python primer

  • 2020-04-02 14:18:11
  • OfStack

  Regular expressions have two basic operations: match and replace.

A match is a search in a text string to match a special expression.

Substitution is to find and replace a string in a string that matches a particular expression.
 
1. Basic elements
 
A regular expression defines a series of special character elements to perform matching actions.

Regular expression basic characters

character describe
text matching text string
. Matches any single character other than a newline character
^ Matches the beginning of a string
$ Matches the end of a string

In regular expressions, we can also constrain the number of matches with a match qualifier.
 
Match qualifier

Maximum matching The minimum matching describe
* * Repeat the previous expression zero or more times
+ + Repeat matching the previous expression once or more
Repeat the previous expression zero or once
{m} {m} The exact repeat matches the previous expression m time
{m,} {m,} At least repeat the previous expression m time
{m,n} {m,n} At least repeat the previous expression m Times, at most, to repeat the previous expression n time

According to the above, ".*" is the maximum match and can match all matched strings of the source string. .* "is the minimum match, matching only the first occurrence of the string. D.*g can match any string beginning with d and ending with g, such as "debug" and "debugging", or even "dog is walking". While d.* g can only match "debug", in the "dog is walking" string, only "dog ".
 
In more complex matches, we can use groups and operators.
 
Groups and operators

group describe
[...] Matches a set of characters, such as [a-z],[1-9] or [,./;']
[^...] Matching all characters except the set is equivalent to an inverse operation
A|B Matching expression A or B , which is equivalent to OR operation
(...) Groups of expressions, one for each pair of parentheses, such as ([a-b]+)([A-Z]+)([1-9]+)
\number Match in number Text within an expression group

There is a special sequence of characters that matches a specific character type or character environment. For example, \b matches character boundaries, and food\b matches "food", "zoofood", but not "foodies".
 
Special character sequence

character describe
\A Matches only the beginning of the string
\b Matches a word boundary
\B Matches the nonboundary of a word
\d Matching any decimal numeric character is equivalent to r'[0-9]'
\D Matches any non-decimal numeric character, equivalent to r'[^0-9]'
\s Matches any space character (space character, tab TAB, newline, carriage return, page change, vertical line)
\S Matches any non-space character
\w Matches any alphanumeric character
\W Matches any non-alphanumeric character
\Z Matches only the end of the string
\\ Matches backslash characters

An assertion sets out an assertion against a specific event.
 
Regular expression declaration

The statement describe
( iLmsux) Matches an empty string, iLmsux The character corresponds to the regular expression modifier in the following table.
( :...) Matches the expression defined in parentheses, but does not populate the character group table.
( P<name>) Matches the expression defined in parentheses, but the matching expression can also be used name A group of symbols for an identity.
( P=name) Matches all text that matches the previously named character group.
( #...) Introduce comments and ignore the parenthesis.
( =...) If the supplied text matches the next regular expression element, there is no extra text in between. This allows leading operations in an expression without affecting the analysis of the rest of the regular expression. Such as "Martin" Then follow the "Brown" , "Martin( =Brown)" Only with "Martin" Matching.
( !...) Matches only if the specified expression does not match the next regular expression element, yes ( =...) The reverse operation of.
( <=...) If the prefix string at the current position of the string is given text, it matches, and the entire expression terminates at the current position. Such as ( <=abc)def The expression and "abcdef" Matching. This match is an exact match of the number of prefix characters.
( <!...) If the prefix string at the current position of the string is not the given body, it matches, yes ( <=...) The reverse operation of.

Regular expressions also support processing flags that affect how the regular is executed.
 
Handling marks

mark describe
I or IGNORECASE Ignore the case of the expression to match the text.

2. Operation
 

With the re module, we can use regularization to search, extract, and replace strings in python. For example, the re. Search () function performs a basic search and returns a MatchObject. The re. Findall () function returns a list of matches.
 


>>> import re
>>> a="this is my re module test"
>>> obj = re.search(r'.*is',a)
>>> print obj
<_sre.SRE_Match object at 0xb7d7a218>
>>> obj.group()
'this is'
>>> re.findall(r'.*is',a)
['this is']

The MatchObject object method

methods describe
expand(template) Expand the content defined in the backslash in the template.
m.group([group,...]) Returns the matched text, which is a tuple. This text is given with group Or the text of a group match defined by its index number. If there is no group match name, all matches are returned.
m.groups([default]) Returns a tuple that contains the text in the pattern that matches all the groups. If given default Parameters, default The parameter value is the return value of a group that does not match the given expression. default The default value of the parameter is None .
m.groupdict([default]) Returns a dictionary containing all matched subgroups. If given default Parameter whose value is the return value of those mismatched groups. default The default value of the parameter is None .
m.start([group]) Returns the specified group Or returns all matched start positions.
m.end([group]) Returns the specified group Or return all matched end positions.
m.span([group]) Returns a group of two elements that is equivalent to a tuple with respect to a given group or a complete matching expression (m.start(group),m.end(group))) The list of
m.pos Passed to the match() or search() Function of the pos Value.
m.endpos Passed to the match() or search() Function of the endpos Value.
m.lastindex
m.lastgroup
m.re To create this MatchObject object
m.string To provide match() or search() The string of the function.

The sub() or subn() functions are used to perform substitution operations on strings. The basic format of sub() function is as follows:
  Sub (the pattern, the replace, string [to count])
 
The sample

 


>>> str = 'The dog on my bed'
>>> rep = re.sub('dog','cat',str)
>>> print rep
The cat on my bed

The replace parameter accepts the function. To obtain the number of times of substitution, use the subn() function. The subn() function returns a tuple containing the replaced text and the number of times replaced.
 
If we need the same regex for multiple matching operations, we can compile the regex into an internal language to improve the processing speed. The compile regular expression is implemented with the compile() function. The basic format of the compile() function is as follows:
  The compile (STR [, flags])
 
STR represents the regular string to compile, and flags is the modifier flag. The regex is compiled to generate an object with multiple methods and properties.
 
Regular object methods/properties

methods / attribute describe
r.search(string[,pos[,endpos]]) with search() Function, but this function allows you to specify the beginning and end of the search
r.match(string[,pos[,endpos]]) with match() Function, but this function allows you to specify the beginning and end of the search
r.split(string[,max]) with split() function
r.findall(string) with findall() function
r.sub(replace,string[,count]) with sub() function
r.subn(replace,string[,count]) with subn() function
r.flags The flag defined when the object is created
r.groupindex will r'( Pid)' The defined symbol group name maps to a dictionary of group Numbers
r.pattern The pattern used when creating an object

Escape the string with the re. Escape () function.
 
Get the object reference through getattr
 


>>> li=['a','b']
>>> getattr(li,'append')
>>> getattr(li,'append')('c')          # The equivalent of li.append('c')
>>> li
['a', 'b', 'c']
>>> handler=getattr(li,'append',None)
>>> handler
<built-in method append of list object at 0xb7d4a52c>
>>> handler('cc')                      # The equivalent of li.append('cc')
>>> li
['a','b','c','cc']
>>>result = handler('bb')
>>>li
['a','b','c','cc','bb']
>>>print result
None


Related articles: