python regular expressions and examples of using regular expressions

  • 2020-07-21 08:56:49
  • OfStack

Regular expression

The regular expression is used to match strings

The regular expression matching process

Compare the values of the expressions and the strings in the text If each character matches, the match is successful; Once there is a character that does not match successfully, the match fails If there are quantifiers and boundaries, the matching process is slightly different

Regular expression syntax rules

语法

说明

表达式实例

匹配字符串

字符

. 匹配除换行"\n"外的任意字符串 abc abc
\ 转义字符,使后1个字符改变原来的意思 a\\c a\c
[...] 字符集,对应的位置可以是字符集中任意字符,字符集中的字符可以逐个列出,也可以给出范围,如[abc]或[a-c]。第1个字符如果是^则表示取反,如[^abc]表示不是abc中的其他字符。所有的特殊的字符在字符集中都失去其原有的特殊含义。在字符集中使用^、]或-,可以使用转义字符匹配它们 a[bcd]e

abe

ace

ade

预定义字符集

\d 数字:[0-9] a\dc a1c
\D 非数字:[^0-9] a\Dc abc
\s 空白字符:[<空格>\t\r\n\f\v] a\sc a c
\S 非空白字符:[^\s] a\Sc abc
\w 单词字符:[a-zA-z0-9_] a\wc abc
\W 非单词字符:[^\w] a\Wc a c

数量词

* 匹配1个字符串0或无限次 abc*

ab

abc

abccc

+ 匹配1个字符串1次或无限次 abc+

abc

abccc

? 匹配1个字符串0次或1次

abc?

ab

abc

{m} 匹配1个字符串m次 abc{2} abcc
{m,n} 匹配1个字符串m到n次 abc{2,3}

abcc

abccc

边界匹配

^ 匹配字符串开头 ^abc abc
$ 匹配字符串末尾 abc$ abc
\A 匹配字符串开始 \Aabc abc
\Z 匹配字符串结束,如果是存在换行,只匹配到换行前的结束字符串 abc\Z abc
\b 匹配1个单词边界,也就是指单词和空格间的位置。例如, 'er\b' 可以匹配"never" 中的 'er',但不能匹配 "verb" 中的 'er'。
\B 匹配非单词边界。'er\B' 能匹配 "verb" 中的 'er',但不能匹配 "never" 中的 'er'。

逻辑分组

| 匹配|表达式左右的任意1个 abc|def

abc

def

(...) 作为分组,每遇到1个(,分组编号加1,使用分组的好处是匹配的子串会保存到1个子组,便于以后使用 (abc){2} abcabc
(?P<name>...) 分组除原有编号外,再加1个别名 (?P<id>abc){2} abcabc
\<number> 引用编号为number的分组匹配到的字符串 (\d)ab\1

1ab1

5ab5

(?P=name) 应用别名为name的分组匹配到的字符串 (?P<id>abc)ee(?P=name) abceeabc

特殊构造(不分组)

(?:...) (...)的不分组版本,用于|或后接数量词 (?:abc){2} abcabc
(?iLmsux) iLmsux中的每个字符代表正则表达式的1种匹配模式,只能用在正则表达式开头,可选多个 (?i)abc AbC
(?#...) 将#后面的字符当做注释忽略 abc(?#comment)def abcdef
(?=...) 之后的字符串表达式需要匹配才能成功,不消耗字符串内容 a(?=\d) 后面是数字的a
(?!...) 之后的字符串表达式需要不匹配才能成功,不消耗字符串内容 a(?!\d) 后面不是数字的a
(?<=...) 之前的字符串表达式需要匹配才能成功,不消耗字符串内容 (?<=\d)a

前面是数字的a

(?<!...) 之前的字符串表达式需要不匹配才能成功,不消耗字符串内容 (?<!\d)a 前面不是数字的a

(?(id/name)yes_

pattern|no_parttern)

如果匹配到分组为id或别名name的字符串成功匹配,则需要匹配yes_pattern

不成功,怎需要匹配no_pattern

(\d)abc(?(1)\d|def)

1abc3

abcdef

The greedy model and the non-greedy model

Greedy mode is as many matching strings as possible, python default to greedy mode, non-greedy mode as few matching strings as possible, add ? after the regular expression; Non greedy mode. For example, the string abcccb, the greedy mode regular expression is ab.*c, and the non-greedy mode regular expression is ab.*? c, abccc for greedy, abc for non-greedy, abbb for string abbb, ab for greedy? , the non-greedy mode is expressed as ab? The & # 63; , ab for the greedy model and a for the non-greedy model.

re module for python

Methods of re module:

1.compile(pattern[,flag]) : Compiles the regular expression pattern, which is faster than the direct lookup

2.match(patter,string[,flag]) : Matches from the beginning of the string string. If the match is successful, the match object is returned.

3.search(pattern,string[,flag]) : Looks up the string, returns the match object if the match is successful, otherwise returns None

findall(pattern,string[,flag]) : Find all (non-repeated) occurrences of the regular expression pattern pattern in the string string; Returns a list of 1 match objects

5.finditer(pattern,string[, flags])b is the same as findall(), but returns an iterator instead of a list. For each match, the iterator returns 1 match object

6.split(pattern,string, max=0) divides the character string into 1 list according to the delimiter in the regular expression pattern, returns the list of successful matches, and splits max at most (the default is to split all matches)

7.sub(pattern, repl, string, max=0) replaces all matches in the string string with the string repl and replaces all matches if the value of max is not given

Methods and properties of matching objects:

string: Text to use when matching re: The pattern object used when matching group(num=0) returns all matches (or subgroups whose number is num) groups() returns 1 tuple with all matched subgroups (if none matches, returns 1 empty tuple)

Parameter flag:

re.I 使匹配对大小写不敏感
re.L 做本地化识别(locale-aware)匹配
re.M 多行匹配,影响 ^ 和 $
re.S 使 . 匹配包括换行在内的所有字符
re.U 根据Unicode字符集解析字符。这个标志影响 \w, \W, \b, \B.
re.X 该标志通过给予你更灵活的格式以便你将正则表达式写得更易于理解。

An example of python using regular expressions


>>> import re
>>> pattern = re.compile(r'foo')
>>> res1 = re.search(pattern,'foo')
>>> res1.group() #  The match object is returned and needs to be called group() Method to display all the matched objects 
'foo'
>>> res1.groups()#  Because there are no subgroups (that is, no groups in the regular expression), empty tuples are returned 
()
>>> res2 = re.findall(pattern,'foobbfoo')
>>> res2 #  The direct return is 1 A list of all matching characters 
['foo', 'foo']
>>> pattern2 = re.compile(r'(\d+)aa')
>>> res3 = re.search(pattern2,'bb32aa')
>>> res3.group() #  Returns all matched objects 
'32aa'
>>> res3.groups() #  contrast res1 the groups() , the regular has a group, returns the group that matches 
('32',)
>>> res4 = re.findall(pattern2,'bb32aacc5aacc')
>>> res4 #  contrast res2 To return to 1 List containing only the characters in the matched group, 
['32', '5']

conclusion


Related articles: