python3. x sample code for extracting Chinese regular expressions
- 2021-07-26 07:55:56
- OfStack
Example 1: Reading characters containing Chinese in txt file
import re ## The editor used here is python3.x
d="[\u4e00-\u9fa5]+" # Chinese matching symbols
f=open('test.txt','rb') # Here with 2 Read in binary system, which is convenient for Chinese escape Do not set a return error Here's TXT Document
# Document content:
Hello world
China
Hello, you are good
This is a txt File
s2f Programmer's Journal 12d3 Programmer's Journal 22d3 Programmer's Journal 32d3 Programmer's Journal 42d3
# This involves the reading of text, first read the file, and then identify and match the document lines
L=[]# Create 1 List to store the Chinese to be generated
for i in f: # Traversal txt Lines in a document
i=i.decode('utf-8')# Change one's mind utf-8
l=re.findall(d,i) # Regular matching Chinese
L+=l # Put Chinese into the list
print(L)
f.close()
***********************************************
***********************************************
Example 2: Reading Chinese characters of a given string
import re ## The editor used here is python3.x
s = "s2f Programmer's Journal 12d3 Programmer's Journal 22d3 Programmer's Journal 32d3 Programmer's Journal 42d3".encode() # String escape is required here
temp = s.decode('utf-8')
pattern="[\u4e00-\u9fa5]+"# Chinese regular expression
regex = re.compile(pattern) # Generate regular objects
results = regex.findall(temp) # Matching
for result in results : # Iteratively traverse the content
print (result)
Summarize