python Simple Method of Transforming Chinese Numerals into Arabic Numerals
- 2021-09-24 23:00:59
- OfStack
Regular expressions extract numbers from text
Here, we demonstrate the extraction of Chinese year from the text below 1
import re
m0 = " In 1949 Ratio of the founding of New China in 199 Zero year low percent 5 Point 2 People 1996 Defeat the Russian army in , Achieve substantial independence "
pattrern1 = '[ Zero 123456789]{4,}'
pattrern2 = '[ In fact, in fact, the 123456789 Zero one two three four five six seven eight nine two ]{4,}'
time1 = re.findall(pattrern1,m0)# Converted digits
Extraction years are: '1949', '1990', '1996'
Many examples of regular expressions are given here: example
Convert Chinese into Arabic numerals
Create a dictionary and talk about numeric input matching:
CN_NUM = {
' In fact, in fact, the ': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, ' Zero ': 0,
' One ': 1, ' 2 ': 2, ' 3 ': 3, ' Four ': 4, ' Wu ': 5, ' Land ': 6, ' Qi ': 7, ' Eight ': 8, ' Nine ': 9, ' In fact, in fact, the ': 2, ' Two ': 2,}
for i in range(len(time1)):
new_str = ''
for j in time1[i]:
new_str += str(CN_NUM[j])
time1[i] = new_str
time1
So the result came out, and the Chinese year in the above example was successfully converted into Arabic numerals: '1949', '1990', '1996'
Complete code
# 2 Change the uppercase figures of the year into Arabic numbers in the sentence
import re
m0 = " In 1949 Ratio of the founding of New China in 199 Zero year low percent 5 Point 2 People 1996 Defeat the Russian army in , Achieve substantial independence "
pattrern1 = '[ Zero 123456789]{4,}'
pattrern2 = '[ In fact, in fact, the 123456789 Zero one two three four five six seven eight nine two ]{4,}'
time1 = re.findall(pattrern1,m0)# Converted digits
CN_NUM = {
' In fact, in fact, the ': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, ' Zero ': 0,
' One ': 1, ' 2 ': 2, ' 3 ': 3, ' Four ': 4, ' Wu ': 5, ' Land ': 6, ' Qi ': 7, ' Eight ': 8, ' Nine ': 9, ' In fact, in fact, the ': 2, ' Two ': 2,}
for i in range(len(time1)):
new_str = ''
for j in time1[i]:
new_str += str(CN_NUM[j])
time1[i] = new_str
time1