Realization of Python read Function Reading File by Byte of Character

  • 2021-07-09 08:33:24
  • OfStack

The file object provides the read () method to read the contents of the file in bytes or characters, depending on whether b mode is used or if b mode is used, one byte at a time; If b mode is not used, 1 character is read at a time. When you call this method, you can pass in an integer as an argument that specifies how many bytes or symbols to read at most.

For example, the following program reads the contents of the entire file in a loop:


f = open("read_test.py", 'r', True)
while True:
  #  Every read 1 Characters 
  ch = f.read(1)
  #  If you don't read the data, jump out of the loop 
  if not ch: break
  #  Output ch
  print(ch, end='')
f.close()

The above program reads every 1 character in turn (because the program does not use b mode), and every time a character is read, the program outputs the character.

As you can see from the above program, it is recommended to call the close () method to close the file immediately after the program has finished reading and writing the file, so as to avoid resource leakage. If you need to close the file more securely, it is recommended that the close () method call that closes the file be executed in the finally block. For example, change the above procedure to the following form:


f =open ("test.txt",'r',True)
try:
  while true:
    # Every read 1 Characters 
    ch = f.read(1)
    # If no data is read, jump out of the loop 
    if not ch:break
    # Output ch
    print(ch, end='')
  finally:
    f.close() 

In order to highlight the topic and simplify the program, this chapter will directly call the close () method to close the file and avoid using finally blocks.

If you call the read () method without passing in parameters, the method reads all the file contents by default. For example, the following procedure:


f = open("test.txt", 'r', True)
#  Read all files directly 
print(f.read())
f.close()

Through the above two programs, readers may have found a problem. When using open () function to open a text file, which character set is used by the program? Always use the character set of the current operating system, such as the Windows platform, and the open () function always uses the GBK character set. Therefore, the test. txt read by the above program must also be saved using the GBK character set; Otherwise, the program will have an UnicodeDecodeError error.

If the character set of the file to be read does not match the character set of the current operating system, there are two solutions:

Read in binary mode and then restore to a string using decode () method of bytes. Use the open () function of the codecs module to open the file, which allows you to specify the character set when opening the file.

The following program uses binary mode to read text files:


#  Specifies the use of 2 Read the contents of files in binary mode 
f = open("read_test3.py", 'rb', True)
#  Read all files directly and call bytes Adj. decode Restore byte contents to strings 
print(f.read().decode('utf-8'))
f.close()

When the above program calls the open () function, it passes in the rb mode, which indicates that the file is read in binary mode. At this time, the read () method of the file object returns the bytes object, and the program can call the decode () method of the bytes object to restore it to a string. Since the read_test3. py file read at this time is saved in the format UTF-8, the program needs to explicitly specify the use of the UTF-8 character set when recovering strings using the decode () method.

The following program uses the open () function of the codes module to open the file, where the character set can be explicitly specified:


import codecs
# Specifies the use of utf-8  Character set to read file contents 
f = codecs.open("read_test4.py", 'r', 'utf-8', buffering=True)
while True:
  # Every read 1 Characters 
  ch = f.read(1)
  # If no data is read, jump out of the loop 
  if not ch : break
  # Output ch
  print (ch, end='')
f.close()

The above program explicitly specifies using the UTF-8 character set when calling the open () function, so that the program has no problem reading the contents of the file at all.


Related articles: