python Simple Method of Transforming Chinese Numerals into Arabic Numerals

  • 2021-09-24 23:00:59
  • OfStack

Regular expressions extract numbers from text

Here, we demonstrate the extraction of Chinese year from the text below 1


import re
m0 = " In 1949 Ratio of the founding of New China in 199 Zero year low percent 5 Point 2 People 1996 Defeat the Russian army in , Achieve substantial independence "
pattrern1 = '[ Zero 123456789]{4,}'
pattrern2 = '[ In fact, in fact, the 123456789 Zero one two three four five six seven eight nine two ]{4,}'
time1 = re.findall(pattrern1,m0)# Converted digits 

Extraction years are: '1949', '1990', '1996'

Many examples of regular expressions are given here: example

Convert Chinese into Arabic numerals

Create a dictionary and talk about numeric input matching:


CN_NUM = {
 ' In fact, in fact, the ': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, ' Zero ': 0,
 ' One ': 1, ' 2 ': 2, ' 3 ': 3, ' Four ': 4, ' Wu ': 5, ' Land ': 6, ' Qi ': 7, ' Eight ': 8, ' Nine ': 9, ' In fact, in fact, the ': 2, ' Two ': 2,}

for i in range(len(time1)):
 new_str = ''
 for j in time1[i]:
 new_str += str(CN_NUM[j])
 time1[i] = new_str
time1

So the result came out, and the Chinese year in the above example was successfully converted into Arabic numerals: '1949', '1990', '1996'

Complete code


# 2 Change the uppercase figures of the year into Arabic numbers in the sentence 
import re
m0 = " In 1949 Ratio of the founding of New China in 199 Zero year low percent 5 Point 2 People 1996 Defeat the Russian army in , Achieve substantial independence "
pattrern1 = '[ Zero 123456789]{4,}'
pattrern2 = '[ In fact, in fact, the 123456789 Zero one two three four five six seven eight nine two ]{4,}'
time1 = re.findall(pattrern1,m0)# Converted digits 
CN_NUM = {
 ' In fact, in fact, the ': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, ' Zero ': 0,
 ' One ': 1, ' 2 ': 2, ' 3 ': 3, ' Four ': 4, ' Wu ': 5, ' Land ': 6, ' Qi ': 7, ' Eight ': 8, ' Nine ': 9, ' In fact, in fact, the ': 2, ' Two ': 2,}

for i in range(len(time1)):
 new_str = ''
 for j in time1[i]:
 new_str += str(CN_NUM[j])
 time1[i] = new_str
time1

Summarize


Related articles: