Detailed Explanation of Character Coding Knowledge Points in python Source File

  • 2021-09-12 01:47:57
  • OfStack

By default, Python source files are processed in UTF-8 encoding. In this encoding, characters from most languages in the world can be used in string literals, variable or function names, and comments at the same time--although the standard library uses only regular ASCII characters as variable or function names, and any portable code should follow this convention. To display these characters correctly, your editor must recognize the UTF-8 encoding and must use a font that supports all characters in the open file.

1. If you don't use the default encoding, declare the encoding used by the file, and write the first line of the file as a special comment.

The syntax is as follows:


# -*- coding: encoding -*-

encoding can be any one codecs supported by Python.

For example, to declare that utf-8 encoding is used, your source code file should be written as:


# -*- coding: utf-8 -*-

2. One exception to the rule on Line 1 is that the source code begins with the line UNIX "shebang". In this case, the encoding declaration will be written on the second line of the file.

For example:


#!/usr/bin/env python3
# -*- coding: utf-8 -*-

Extension of knowledge points:

Definition of code

We start with "SOS" (international help signal), and its Morse code code is:

"…--…", why choose S, O and S as distress signals? Because it is simple, easy to identify and not easy to send mistakes!
So, the character encoding is:

& acute; Given a series of characters, each character is assigned a numerical value, which represents the corresponding character. This numerical value is the code of the character. For example, if we assign the value 0x41 to the character 'A', then 0x41 is the encoding of the character 'A'. Character coding is the expression and storage of characters.

Character encoding needs to deal with two things

(1) Specify how many bytes a character in a character set is represented;

(2) Formulate the character coding table of the character set, that is, the (binary) value corresponding to each character in the character set.


Related articles: