python method of setting file encoding format

2020-06-19 10:42:35
OfStack

If you want to write Chinese in python2's py file, you must add a line stating the file encoding, otherwise python2 will default to ASCII encoding. (python3 no longer has this problem; the default file encoding for python3 is ES7en-8)

Code comments must be placed on line 1 or line 2. Generally, the first two lines of the Python file should read:


#!/usr/bin/python
# -*- coding: UTF-8 -*-

Line 1 specifies the python interpreter and line 2 specifies the python file encoding, which can be set in the following optional ways

1. Setting method with equal sign:


#!/usr/bin/python
# coding=<encoding name>

2. The most common, with a colon (which most editors recognize correctly) :


#!/usr/bin/python
# -*- coding: <encoding name> -*-

3. vim:


#!/usr/bin/python
# vim: set fileencoding=<encoding name> :

The encoding declaration that sets the header has the following functions:

If you have Chinese comments in your code, you need this declaration

More advanced editors, such as my emacs, will format this as a code file based on header declarations.

The program will initialize u "Life is Short", such as the unicode object, by decoding the header declaration.

Sets the default decoding format


import sys  # reference sys Modules come in, they don't go in sys The first 1 Time to load  
reload(sys) # Reload the sys 
sys.setdefaultencoding('utf8') ## call setdefaultencoding function

Of particular note here is reload(sys) on line 2, this 1 must not be less, without which it will not run correctly. So why reload a function that cannot be called by direct reference? Since the setdefaultencoding function was deleted after being called by the system, it is no longer available when referenced through import, so the reload1 sys module is required for setdefaultencoding to be available to modify the current character encoding of the interpreter in the code.

Under the Lib folder of the python installation directory, there is a file called ES53en.py. You can find main(), ww > setencoding � () > sys.setdefaultencoding (encoding), since the site.py is automatically loaded every time the python interpreter is started, the main function is executed every time, and the setdefaultencoding function 1 is deleted.

With respect to ES69en.defaultencoding, this is used when the decoding does not specify the decoding method. For example, I have the following code:


#! /usr/bin/env python 
# -*- coding: utf-8 -*- 
s = ' Chinese ' #  Notice that the  str  is  str  Type of, not  unicode 
s.encode('gb18030')

This code recodes s to gb18030, i.e. unicode - > Conversion of str. Because s itself is of type str, therefore

Python will automatically decode s to unicode and then encode gb18030. Since the decoding is done automatically by python, we did not specify the decoding method, so python USES the method indicated by ES89en.defaultencoding to decode. In many cases sys.defaultencoding is

ANSCII. If s is not of this type, an error will occur. Take the above situation for example, my sys.defaultencoding is anscii, but the encoding method of s and the encoding method of the file are utf8, so there is an error:


UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 
0: ordinal not in range(128)

In this case, there are two ways to correct the error:

1 explicitly indicates the encoding of s


#! /usr/bin/env python 
# -*- coding: utf-8 -*- 
s = ' Chinese ' 
s.decode('utf-8').encode('gb18030')

2 is to change the encoding of sys. defaultencoding for the file


#! /usr/bin/env python 
# -*- coding: utf-8 -*- 
import sys 
reload(sys) # Python2.5  It will be deleted after it is initialized  sys.setdefaultencoding  This method, we need to reload  
sys.setdefaultencoding('utf-8') 
str = ' Chinese ' 
str.encode('gb18030')