Conversion between unicode and str in python2 and Difference from python3 str

  • 2021-07-26 08:22:55
  • OfStack

Strings in python2 are divided into unicode and str types


    Str To Unicode  Use decode(),  Decoding 
    Unicode To Str  Use encode(),  Code 

When returning data to the front end, you need to convert unicode to str type first. In fact, str in python2 is a string of bytes (byte), And when communicating on the network, If the front end needs to receive json data, it needs to use json. dumps () to convert the data into json format for return. When the data is nested data, the data in the inner layer may not be directly converted into str type data. At this time, it can use eval () function for conversion, and then use json. dumps () to convert into json format data. json is actually a string.

python2 String type is str by default. If you need to get a string of unicode type, you need to declare it in the following way:


my_str = u"lowmanisbusy" #  Add before the variable  u

If the character code is not re-specified in python2, when defining the Chinese string, you need to add "u" to indicate that the string is unicode type, and specify to use unicode coding to encode and save it:


my_zh_str = u" Love rat is not only ugly " #  Add before the variable  u

unicode code: unicode code assigns a only one number to all characters in the world, Is 106-ary, For example, the Unicode number of the simplified Chinese character "slag" is 6E23, In python2, it is "\ u6e23", but unicode only defines the number of each character, and does not define how to store this number. Therefore, utf-8, gbk and other coding formats appeared later, which are all one implementation of unicode, and still use the only one number in unicode. Personal simple understanding of it is based on unicode coding.

In python3, strings are divided into two types: str and bytes


  Str To Bytes  Use  encode(),  Code 
  Bytes To Str  Use  decode(),  Decoding 

What needs to be explained here is that, A string of type bytes in python3, Is equivalent to a string of type str in python2, There is no string of type unicode in python3, In fact, this involves the problem of default coding. The default character encoding for python3 is: utf-8, The default character encoding for python2 is: ASCII, The ASCII code contains 128 characters, Which includes all the English characters, Arabic numerals, Punctuation marks, Control symbols, etc., But there is no Chinese, Chinese is hieroglyphic, More bytes need to be combined to represent each Chinese character, Therefore, ASCII cannot satisfy the representation of Chinese, Therefore, if the character encoding is not reset in python2, Code is not allowed to appear str type of Chinese string (can be specified as unicode type), because the Cpython2 interpreter can not identify. As for the relationship between ASCII, UNICODE, UTF-8 and the difference here will not be detailed, you can understand 1 by yourself (utf-8 coding is an implementation of unicode coding, personal understanding can be considered as the following such a relationship: utf-8 < --- > unicode < --- > byte, finally, the data transmission is still in the form of binary 1 byte transmission)

By default, the type of string in python3 is str, and in the web framework, str is automatically converted to byte and returned to the front end.

When bytes of one encoding format needs to be converted into str of another encoding format, it needs to be decoded into str type according to the original encoding format, and then converted into bytes type by using the new encoding format

For example, if there is a variable my_bt, which is bytes of the encoding format gbk, it needs to be converted into the encoding format of utf-8, and the following processing is required:


my_str = my_bt.decode("gbk") #  Decoding 
    my_bt = my_str.encode("utf-8") #  Recoding 

Because there are no strings of type unicode in python3, it makes no sense to define strings in the following way in python3


my_str = u" Love rat is not only ugly "

ps: Let's look at the unicode to string of Python2


str.encode('unicode-escape').decode('string_escape')

Summarize


Related articles: