Python string encode and decode research solution to the problem of messy code

  • 2020-04-02 09:23:17
  • OfStack

UnicodeEncodeError: 'ASCII' codec can't encode characters in position 0-1: ordinal not in range(128) '? This article will take a look at this problem.

The representation of strings in Python is unicode encoding, so it is usually necessary to use unicode as the intermediate encoding when doing encoding conversion, that is, to decode (decode) other encoded strings into unicode, and then from unicode (encode) into another encoding.  

The purpose of decode is to convert strings of other encodings into unicode encodings, such as str1.decode('gb2312'), which translates str1 of gb2312 into unicode encodings.  

The purpose of encode is to convert unicode encodings into strings of other encodings, such as str2.encode('gb2312'), which converts unicode encodings into strings of gb2312.  

So when you transcode you have to figure out what the string STR is, and then decode is unicode, and then encode is something else, right

The default encoding of the string in the code is the same as the encoding of the code file itself.  

S =' Chinese '

The string is utf8 if it is in a utf8 file, and gb2312 if it is in a gb2312 file. In this case, to do the encoding conversion, you need to use decode method to convert it to unicode encoding, and then use encode method to convert it to other encoding. Typically, when no specific encoding is specified, the code file is created using the system default encoding.  

If the string is defined like this: s=u' Chinese '

The encoding of the string is specified as unicode, which is python's internal encoding, regardless of the encoding of the code file itself. Therefore, for this case to do the encoding conversion, you just need to directly use the encode method to convert it to the specified encoding.

If a string is already unicode, then decoding will be wrong, so it is usually necessary to determine whether it is encoded in unicode:

Isinstance (unicode) s,   # is used to determine whether it is unicode or not  

Using STR as a non-unicode encoding results in an error  

  How do I get the default code for the system?  

#! The/usr/bin/env python
# coding = utf-8
The import sys
Print sys. Getdefaultencoding ()    

The program in English WindowsXP output as: ASCII  

In some ides, the string output is always garbled or even wrong because the IDE output console itself cannot display the string encoding, not because of the program itself.  

For example, run the following code in UliPad:

S = u "Chinese"
The print s  

UnicodeEncodeError: 'ASCII' codec can't encode characters in position 0-1: ordinal not in range(128). This is because the console information output window of UliPad on English WindowsXP is output according to ASCII encoding (the default encoding of English system is ASCII), and the string in the above code is Unicode encoding, so there is an error in the output.

Change the last sentence to: print s.code ('gb2312')

Can correctly output "Chinese" two words.

Print s.code ('utf8')

Then output: \xe4\xb8\xad\xe6\x96\x87, this is the console information output window output utf8-encoded string in ASCII encoding.

Unicode (STR,'gb2312') is the same as STR. Decode ('gb2312'), which converts gb2312's STR to unicode  

Use STR. S. S. S. S. To view the encoded form of STR

Principle said for a long time, the last to a cureall :)


#!/usr/bin/env python 
#coding=utf-8 
s=" Chinese " 

if isinstance(s, unicode): 
#s=u" Chinese " 
print s.encode('gb2312') 
else: 
#s=" Chinese " 
print s.decode('utf-8').encode('gb2312')


Related articles: