python3. x sample code for extracting Chinese regular expressions

  • 2021-07-26 07:55:56
  • OfStack

Example 1: Reading characters containing Chinese in txt file


import re ## The editor used here is python3.x 
d="[\u4e00-\u9fa5]+" # Chinese matching symbols 
f=open('test.txt','rb') # Here with 2 Read in binary system, which is convenient for Chinese escape   Do not set a return error   Here's TXT Document 
# Document content: 
Hello world
China
 Hello, you are good 
This is a txt File
s2f Programmer's Journal 12d3 Programmer's Journal 22d3 Programmer's Journal 32d3 Programmer's Journal 42d3
# This involves the reading of text, first read the file, and then identify and match the document lines 
L=[]# Create 1 List to store the Chinese to be generated 
for i in f: # Traversal txt Lines in a document 
  i=i.decode('utf-8')# Change one's mind utf-8
  l=re.findall(d,i) # Regular matching Chinese 
  L+=l # Put Chinese into the list 
print(L)
f.close()

***********************************************

***********************************************

Example 2: Reading Chinese characters of a given string


import re ## The editor used here is python3.x 
s = "s2f Programmer's Journal 12d3 Programmer's Journal 22d3 Programmer's Journal 32d3 Programmer's Journal 42d3".encode() # String escape is required here 
temp = s.decode('utf-8') 
pattern="[\u4e00-\u9fa5]+"# Chinese regular expression 
regex = re.compile(pattern) # Generate regular objects  
results = regex.findall(temp) # Matching 
for result in results : # Iteratively traverse the content 
  print (result)

Summarize


Related articles: