Detailed Explanation of re Module of Python
- 2021-12-04 10:35:23
- OfStack
Python--re Module
Brief introduction
Definition: re module is called regular expression;
Function: Create a "regular expression", which is used to verify and find the text that conforms to the rules, and is widely used in various search engines, account password verification, etc.;
Predefined character
"""
\d Matches all of the 10 Binary digit 0-9
\D Matches all non-digits, including underscores
\s Matches all white space characters (spaces, TAB Etc.)
\S Matches all non-white space characters, including underscores
\w Match all letters, Chinese characters and numbers a-z A-Z 0-9
\W Match all non-letters, Chinese characters and numbers, including underscores
"""
Special character
1, $: Matches the end of line 1 (must be placed at the back of regular expression)
2, ^: Match the beginning of line 1 (must be placed at the front of regular expression)
3, *: The preceding character can appear 0 or more times (0 ~ unlimited)
4. +: The preceding character can appear 1 or more times (1 ~ unlimited)
5,? Change "greedy mode" to "reluctant mode", and the preceding character can appear 0 times or 1 time
6,.: Matches any single character except the newline "\ n"
7.: Match both items
8. []: Represents a set, which has the following three cases
[abc]: Match single characters [a-z0-9]: Characters that match a specified range can be reversed (preceded by ^) [2-9] [1-3]: Able to do combination matching9, {}: Used to mark the frequency of the preceding characters, as follows:
{n, m}: Means that the preceding character appears at least n and at most m {n,}: Represents a minimum of n occurrences of preceding characters and an unrestricted maximum {, m}: Represents a maximum of n occurrences of preceding characters and a minimum of unrestricted occurrences {n}: Previous character must appear n times
Backslash problem
If there is a backslash in the string, you need to escape the backslash:
str = "\\123 223" # \123 223
str = r"\123 223" # \123 223
In a regular expression, we need to match one backslash with multiple backslashes:
find = re.search('\\\\\w+', str)
find = re.search(r'\\\w+', str)
Matching method
1. match: Match at the beginning of the target text
find = re.math('hello', str1) # Successful matching returns the matching object hello Unsuccessful return None
2. search: Matching throughout the target text
3. findall: Scan the whole target text and return a list of all substrings that match the rule. If there is no match, return an empty list
4. finditer: Scan the whole target text and return an iterator composed of all substrings matching the rule
5. fullmatch: Require the target text to exactly match the rule, otherwise return None
6. sub: Replace the substring that matches the rule with other text
str1 = re.sub('\w+', 'aaa', str, count=0) # count Default to 0 Replace all
7. split: Cut from the substrings matched with the rule and return the list composed of the cut substrings
8. Method of matching objects (used for matched objects):
(): Grouping character, which can group the matching content and quickly obtain the data in the grouping
group: Used to view what the specified grouping matches
str = '<p> This is 1 A <a href="###"> Text </a></p>
find = re.search('<a href="(.+)">(\w+)</a>', str)
print(find.group()) # Default to 0 That represents all the matched text, passing in 1 Output the number when 1 Groups ###
groups: Returns 1 tuple with all matched contents (above case output ('# # #', 'text'))
groupdict: Returns a dictionary containing key-value pairs for the grouping, and the grouping needs to be named
find = re.search('<a href="(?P<href>.+)">(?P<text>\w+)</a>', str)
start: Returns the starting index of the matched content in the text end: Returns the ending index of the matched content in the text span: Returns the tuple recommendation consisting of the starting index and the ending index
Recommend a regular website: https://alf.nu/RegexGolf
Summarize
This article is here, I hope to give you help, but also hope that you can pay more attention to this site more content!