Python implements the method of interconversion of full angle half angle characters

2020-05-17 05:49:58
OfStack

preface

I believe that for every programmer, in the text processing, will often encounter a full half Angle not 1 to the problem. So you need a program that can quickly go from one to the other. Because of the mapping relationship between the whole Angle and the half Angle, it is not complicated to deal with.

The specific rules are:

Full corner character unicode encodes from 65281 to 65374 (base 106 0xFF01 to 0xFF5E)

Half corner characters unicode are encoded from 33~126 (base 106 0x21~ 0x7E)

The space is special, the full Angle is 12288 (0x3000), the half Angle is 32 (0x20).

Besides, the full Angle/half Angle is in the order of unicode (half Angle + 65248 = full Angle).

So you can directly use the +- method to deal with non-space data, separate processing of the space.

I'm using 1's

chr() The function takes 1 integer (that is, 0 to 255) in the range (256) range and returns 1 corresponding character.

unichr() Like it 1, except that the Unicode character is returned.

ord() The function is chr() A function or unichr() The pairing function of the ASCII function, which takes a character (a string of length 1) as an argument and returns the corresponding ASCII value, or Unicode value.

First, print the mapping relationship:


for i in xrange(33,127):
 print i,chr(i),i+65248,unichr(i+65248)

Returns the result


33 ! 65281  ! 
34 " 65282  " 
35 # 65283  # 
36 $ 65284  $ 
37 % 65285  % 
38 & 65286  & 
39 ' 65287  ' 
40 ( 65288  ( 
41 ) 65289  ) 
42 * 65290  * 
43 + 65291  + 
44 , 65292  . 
45 - 65293  - 
46 . 65294  . 
47 / 65295  / 
48 0 65296  0 
49 1 65297  1 
50 2 65298  2 
51 3 65299  3 
52 4 65300  4 
53 5 65301  5 
54 6 65302  6 
55 7 65303  7 
56 8 65304  8 
57 9 65305  9 
58 : 65306  : 
59 ; 65307  ; 
60 < 65308  The < 
61 = 65309  = 
62 > 65310  > 
63 ? 65311  ? 
64 @ 65312  @ 
65 A 65313  a. 
66 B 65314  B 
67 C 65315  C 
68 D 65316  D 
69 E 65317  E 
70 F 65318  F 
71 G 65319  G 
72 H 65320  H. 
73 I 65321  i. 
74 J 65322  J 
75 K 65323  K. 
76 L 65324  L 
77 M 65325  M 
78 N 65326  N 
79 O 65327  O 
80 P 65328  P 
81 Q 65329  Q 
82 R 65330  R 
83 S 65331  s. 
84 T 65332  T 
85 U 65333  U 
86 V 65334  V 
87 W 65335  W. 
88 X 65336  X 
89 Y 65337  Y 
90 Z 65338  Z 
91 [ 65339  [ 
92 \ 65340  \ 
93 ] 65341  ] 
94 ^ 65342  ^ 
95 _ 65343  _ 
96 ` 65344  ` 
97 a 65345  a. 
98 b 65346  b 
99 c 65347  c 
100 d 65348  d 
101 e 65349  e 
102 f 65350  f 
103 g 65351  g 
104 h 65352  H. 
105 i 65353  i. 
106 j 65354  j 
107 k 65355  K. 
108 l 65356  l 
109 m 65357  m 
110 n 65358  n 
111 o 65359  o 
112 p 65360  p 
113 q 65361  q 
114 r 65362  r 
115 s 65363  s. 
116 t 65364  t 
117 u 65365  u 
118 v 65366  v 
119 w 65367  W. 
120 x 65368  x 
121 y 65369  y 
122 z 65370  z 
123 { 65371  { 
124 | 65372  | 
125 } 65373  } 
126 ~ 65374  ~

Turn the full Angle into half:


def full2half(s):
 n = []
 s = s.decode('utf-8')
 for char in s:
 num = ord(char)
 if num == 0x3000:
  num = 32
 elif 0xFF01 <= num <= 0xFF5E:
  num -= 0xfee0
 num = unichr(num)
 n.append(num)
return ''.join(n)

To convert a half Angle to a full Angle:


def half2full(s):
 n = []
 s = s.decode('utf-8')
 for char in s:
 num = char(char)
 if num == 320:
  num = 0x3000
 elif 0x21 <= num <= 0x7E:
  num += 0xfee0
 num = unichr(num)
 n.append(num)
return ''.join(n)

The above method is very simple, but in reality it may not be possible to convert all the characters into half corners. For example, in Chinese articles, we expect to convert all the letters and Numbers into half corners, while common punctuation marks use full corners, so the above conversion is not suitable.

The solution is a custom dictionary.


#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
FH_SPACE = FHS = ((u"　", u" "),)
FH_NUM = FHN = (
 (u" 0 ", u"0"), (u" 1 ", u"1"), (u" 2 ", u"2"), (u" 3 ", u"3"), (u" 4 ", u"4"),
 (u" 5 ", u"5"), (u" 6 ", u"6"), (u" 7 ", u"7"), (u" 8 ", u"8"), (u" 9 ", u"9"),
)
FH_ALPHA = FHA = (
 (u" a. ", u"a"), (u" b ", u"b"), (u" c ", u"c"), (u" d ", u"d"), (u" e ", u"e"),
 (u" f ", u"f"), (u" g ", u"g"), (u" H. ", u"h"), (u" i. ", u"i"), (u" j ", u"j"),
 (u" K. ", u"k"), (u" l ", u"l"), (u" m ", u"m"), (u" n ", u"n"), (u" o ", u"o"),
 (u" p ", u"p"), (u" q ", u"q"), (u" r ", u"r"), (u" s. ", u"s"), (u" t ", u"t"),
 (u" u ", u"u"), (u" v ", u"v"), (u" W. ", u"w"), (u" x ", u"x"), (u" y ", u"y"), (u" z ", u"z"),
 (u" a. ", u"A"), (u" B ", u"B"), (u" C ", u"C"), (u" D ", u"D"), (u" E ", u"E"),
 (u" F ", u"F"), (u" G ", u"G"), (u" H. ", u"H"), (u" i. ", u"I"), (u" J ", u"J"),
 (u" K. ", u"K"), (u" L ", u"L"), (u" M ", u"M"), (u" N ", u"N"), (u" O ", u"O"),
 (u" P ", u"P"), (u" Q ", u"Q"), (u" R ", u"R"), (u" s. ", u"S"), (u" T ", u"T"),
 (u" U ", u"U"), (u" V ", u"V"), (u" W. ", u"W"), (u" X ", u"X"), (u" Y ", u"Y"), (u" Z ", u"Z"),
)
FH_PUNCTUATION = FHP = (
 (u" . ", u"."), (u" . ", u","), (u" ! ", u"!"), (u" ? ", u"?"), (u" " ", u'"'),
 (u"'", u"'"), (u" ' ", u"`"), (u" @ ", u"@"), (u" _ ", u"_"), (u" : ", u":"),
 (u" ; ", u";"), (u" # ", u"#"), (u" $ ", u"$"), (u" % ", u"%"), (u" & ", u"&"),
 (u" ( ", u"("), (u" ) ", u")"), (u" � ", u"-"), (u" = ", u"="), (u" * ", u"*"),
 (u" + ", u"+"), (u" - ", u"-"), (u" / ", u"/"), (u" The < ", u"<"), (u" > ", u">"),
 (u" [ ", u"["), (u" RMB ", u"\\"), (u" ] ", u"]"), (u" ^ ", u"^"), (u" { ", u"{"),
 (u" | ", u"|"), (u" } ", u"}"), (u" ~ ", u"~"),
)
FH_ASCII = HAC = lambda: ((fr, to) for m in (FH_ALPHA, FH_NUM, FH_PUNCTUATION) for fr, to in m)
 
HF_SPACE = HFS = ((u" ", u"　"),)
HF_NUM = HFN = lambda: ((h, z) for z, h in FH_NUM)
HF_ALPHA = HFA = lambda: ((h, z) for z, h in FH_ALPHA)
HF_PUNCTUATION = HFP = lambda: ((h, z) for z, h in FH_PUNCTUATION)
HF_ASCII = ZAC = lambda: ((h, z) for z, h in FH_ASCII())
 
 
def convert(text, *maps, **ops):
 """  The Angle of / Half Angle conversion 
 args:
 text: unicode string need to convert
 maps: conversion maps
 skip: skip out of character. In a tuple or string
 return: converted unicode string
 """
 
 if "skip" in ops:
 skip = ops["skip"]
 if isinstance(skip, basestring):
  skip = tuple(skip)
 
 def replace(text, fr, to):
  return text if fr in skip else text.replace(fr, to)
 else:
 def replace(text, fr, to):
  return text.replace(fr, to)
 
 for m in maps:
 if callable(m):
  m = m()
 elif isinstance(m, dict):
  m = m.items()
 for fr, to in m:
  text = replace(text, fr, to)
 return text
 
 
if __name__ == '__main__':
 text = u" Narita airport -  【  JR narita physical エ ク ス プ レ ス number ・ Cross the city line, 2 Station "-  �  Beijing -  【  JR new  �   �  は や ぶ さ number ・ New aomori ,6 standing   "- new aomori -" JR ス physical  �  パ  �   �  white number ・ Letter of  �  line, 4 standing    】  --  �  "
print convert(text, FH_ASCII, {u" 【 ": u"[", u" 】 ": u"]", u",": u" . ", u".": u" . ", u"?": u" ? ", u"!": u" ! "}, spit=" . ? ! "" ")

Special note: in the English language, quotation marks do not distinguish between the first and last quotation marks.

conclusion

The above is about the method of Python to realize the interconversion of full-angle and half-angle characters. I hope the content of this paper can bring you a certain help in your study or work. If you have any questions, you can leave a message to communicate.