Detailed Explanation of re Module of Python

  • 2021-12-04 10:35:23
  • OfStack

Directory Python--Introduction of re Module Summary of Matching Methods for Backslash Problem of Predefined Characters and Special Characters

Python--re Module

Brief introduction

Definition: re module is called regular expression;

Function: Create a "regular expression", which is used to verify and find the text that conforms to the rules, and is widely used in various search engines, account password verification, etc.;

Predefined character


"""
\d	 Matches all of the 10 Binary digit 	0-9
\D	 Matches all non-digits, including underscores 
\s	 Matches all white space characters (spaces, TAB Etc.) 
\S	 Matches all non-white space characters, including underscores 
\w	 Match all letters, Chinese characters and numbers 	a-z A-Z 0-9
\W	 Match all non-letters, Chinese characters and numbers, including underscores 
"""

Special character

1, $: Matches the end of line 1 (must be placed at the back of regular expression)

2, ^: Match the beginning of line 1 (must be placed at the front of regular expression)

3, *: The preceding character can appear 0 or more times (0 ~ unlimited)

4. +: The preceding character can appear 1 or more times (1 ~ unlimited)

5,? Change "greedy mode" to "reluctant mode", and the preceding character can appear 0 times or 1 time

6,.: Matches any single character except the newline "\ n"

7.: Match both items

8. []: Represents a set, which has the following three cases

[abc]: Match single characters [a-z0-9]: Characters that match a specified range can be reversed (preceded by ^) [2-9] [1-3]: Able to do combination matching

9, {}: Used to mark the frequency of the preceding characters, as follows:

{n, m}: Means that the preceding character appears at least n and at most m {n,}: Represents a minimum of n occurrences of preceding characters and an unrestricted maximum {, m}: Represents a maximum of n occurrences of preceding characters and a minimum of unrestricted occurrences {n}: Previous character must appear n times

Backslash problem

If there is a backslash in the string, you need to escape the backslash:


str = "\\123 223"		# \123 223
str = r"\123 223"		# \123 223

In a regular expression, we need to match one backslash with multiple backslashes:


find = re.search('\\\\\w+', str)
find = re.search(r'\\\w+', str)

Matching method

1. match: Match at the beginning of the target text


find = re.math('hello', str1)		#  Successful matching returns the matching object hello Unsuccessful return None

2. search: Matching throughout the target text

3. findall: Scan the whole target text and return a list of all substrings that match the rule. If there is no match, return an empty list

4. finditer: Scan the whole target text and return an iterator composed of all substrings matching the rule

5. fullmatch: Require the target text to exactly match the rule, otherwise return None

6. sub: Replace the substring that matches the rule with other text


str1 = re.sub('\w+', 'aaa', str, count=0)		# count Default to 0 Replace all 

7. split: Cut from the substrings matched with the rule and return the list composed of the cut substrings

8. Method of matching objects (used for matched objects):

(): Grouping character, which can group the matching content and quickly obtain the data in the grouping

group: Used to view what the specified grouping matches

str = '<p> This is 1 A <a href="###"> Text </a></p>
find = re.search('<a href="(.+)">(\w+)</a>', str)
print(find.group())		#  Default to 0 That represents all the matched text, passing in 1 Output the number when 1 Groups ###
groups: Returns 1 tuple with all matched contents (above case output ('# # #', 'text')) groupdict: Returns a dictionary containing key-value pairs for the grouping, and the grouping needs to be named

find = re.search('<a href="(?P<href>.+)">(?P<text>\w+)</a>', str)

start: Returns the starting index of the matched content in the text end: Returns the ending index of the matched content in the text span: Returns the tuple recommendation consisting of the starting index and the ending index

Recommend a regular website: https://alf.nu/RegexGolf

Summarize

This article is here, I hope to give you help, but also hope that you can pay more attention to this site more content!


Related articles: