python traversal string of containing Chinese characters

  • 2020-05-27 06:23:15
  • OfStack

python traverses the string (including Chinese characters)


s = " China china"
for j in s:
  print j

First one, what is your code for 'a'? It may not be the gbk you think


>>> a=' China '
>>> a

If it comes out in six words (word), it means utf-8; if it comes out in four words, it means gbk.

In addition, neither utf-8 nor gbk can be traversed like this, because here it's going to take out one word at a time. The virtual machine treats a as a string of length len(a).

Next comes the traversal problem.

Linux shell is mostly utf-8 by default, so one Chinese character is 3 characters, so you need to read 3 characters, you can try:


>>> a[:3]

Out comes the word "in.

The default of windows command is cp936, which is gbk. One Chinese character is two characters, so two characters are read with two characters (a[:2]).

There is another method to traverse, to convert the string to unicode, so that the Chinese and English are all 1 word, you can use your for i in a method to traverse. The advantage of this is that the Chinese and English characters are all one character, while the English letters in utf-8 and gbk are only one character.


s = u" China china"
for j in s:
  print j

The output is as follows:


 In the 
 countries 
c
h
i
n
a

Thank you for reading, I hope to help you, thank you for your support of this site!


Related articles: