Python3 Chinese file read write method

2020-07-21 09:05:24
OfStack

The representation of a string inside Python is Unicode encoding, so when encoding conversion, Unicode is usually needed as the intermediate encoding, that is, the other encoded string is decoded (decode) to Unicode, and then from Unicode (encode) to another encoding.

In the new version of python3, the unicode type was removed and replaced by the string type using unicode characters (str). The string type (str) becomes the base type as shown below, while the encoded one becomes the byte type (bytes), but the two functions are used the same way:


  decode    encode
bytes ------> str(unicode)------>bytes


u = ' Chinese ' # Specifies a string type object u 
str = u.encode('gb2312') # In order to gb2312 Coding for u Code it, get it bytes Type of object str 
u1 = str.decode('gb2312')# In order to gb2312 Encoding pair string str Decode to get a string type object u1 
u2 = str.decode('utf-8')# If the utf-8 The coding of str The decoded result will not restore the original string content

File reading problem

If we read a file, file, use the encoding format, determines we read from the file content encoding format, for example, we from the notepad. A new text file test txt, edit the content and save time attention, encoding format is a choice, for instance, we can choose gb2312, then using python read from the file content, as follows:


f = open('test.txt','r')
s = f.read() # Read file contents , If you don't know other people encoding Format (identified encoding Type depending on the system used), the read will fail here 
''' Suppose the file is saved to gb2312 Code to save '''
u = s.decode('gb2312') # Decode the content in file save format and get unicode string 
''' Now we can do a variety of coded transformations on the content '''
str = u.encode('utf-8')# convert utf-8 Encoded string str
str1 = u.encode('gbk')# convert gbk Encoded string str1
str1 = u.encode('utf-16')# convert utf-16 Encoded string str1

codecs reads files

python gives us a package, codecs (), for file reading. The open() function in this package can specify the encoding type:


import codecs 
f = codecs.open('text.text','r+',encoding='utf-8')# You must know in advance the file encoding format, which is used here utf-8 
content = f.read()# if open The use of encoding And the document itself encoding Don't 1 If so, there will be an error  
f.write(' The message you want to write ') 
f.close()