python traversal string of containing Chinese characters
- 2020-05-27 06:23:15
- OfStack
python traverses the string (including Chinese characters)
s = " China china"
for j in s:
print j
First one, what is your code for 'a'? It may not be the gbk you think
>>> a=' China '
>>> a
If it comes out in six words (word), it means utf-8; if it comes out in four words, it means gbk.
In addition, neither utf-8 nor gbk can be traversed like this, because here it's going to take out one word at a time. The virtual machine treats a as a string of length len(a).
Next comes the traversal problem.
Linux shell is mostly utf-8 by default, so one Chinese character is 3 characters, so you need to read 3 characters, you can try:
>>> a[:3]
Out comes the word "in.
The default of windows command is cp936, which is gbk. One Chinese character is two characters, so two characters are read with two characters (a[:2]).
There is another method to traverse, to convert the string to unicode, so that the Chinese and English are all 1 word, you can use your for i in a method to traverse. The advantage of this is that the Chinese and English characters are all one character, while the English letters in utf-8 and gbk are only one character.
s = u" China china"
for j in s:
print j
The output is as follows:
In the
countries
c
h
i
n
a
Thank you for reading, I hope to help you, thank you for your support of this site!