In depth interpretation of how Python reads and writes files

  • 2021-12-11 08:11:01
  • OfStack

open

Python provides very convenient file reading and writing functions, in which open Is the first step to read and write files, through open The way to read and write files is the same as putting elephants in the refrigerator


f = open("test.txt",'w')    # No. 1 1 Step, put the refrigerator door ( Documents ) Open 
f.write("this is content")  # No. 1 2 Step, put the elephant ( File content ) Put it in 
f.close()                   # No. 1 3 Step, close the refrigerator door, or the elephant may run away 

open Is defined as

file=open(path,mode='r',buffering=-1,encoding=None)

Among them,

path Is the file path mode Is read mode, and the default is r That is, read-only mode. buffering Buffer, because the memory read and write faster than the peripheral, so most of the circumstances do not need to be set, that is, not more than 0. encoding Is the encoding mode Finally, the output file Is a file object.

Among them, mode Including the following

r r+ w w+ a a+
b rb rb+ wb wb+ ab ab+

Where b represents binary, r represents read, w represents write, and a represents append. No matter what mode, there are open1 It means that it can be read and written. Writing 1 will overwrite the original file, while appending will start writing at the end of the original file. If the file does not exist, open2 A new file is created.

File object

Pass open Create a file object, except for the file that is used to close the file open4 In addition, there are two most commonly used sets of functions, namely, reading and writing open5 And open6 For reading and writing, respectively, with the following differences

read write 读写整个文件
read(size)可读取size大小的文件
readline 每次读1行
由于write直接输入字符串,故不必设置writeline
readlines writelines 前者按行读取文件,并存入1个字符串列表
writelines将1个字符串列表的形式写入文件

For example


>>> f = open('test.txt','w')
>>> f.writelines(['a','b','c\n','d'])
>>> f.close()
>>> f = open('test.txt','r')
>>> f.readlines()
['abc\n', 'd']      # Writing lines Is not automatically added when the \n
>>> f.close()

According to the performance of my computer, reading txt of 500M will exceed 1s, and reading files of 2G will probably report errors. At this time, you need to pass open7 Function to specify the offset, and then read and write the file at the offset position. The input is open8

Among them

open9 Is the offset open0 It is an offset mode, and when it is 0, it means absolute positioning; When it is 1, it indicates relative positioning; A value of 2 indicates positioning from the end.

From open7 From the perspective of, open File, if you use the open3 Is represented by open4 If you use it, open5 Is represented by open6 .

Pass open7 You can return the current offset, which is equivalent to open7 Gets or sets the dual function of.

After the operation on the file is finished, you need to use the open9 Writing the string in the cache to the hard disk; If you are afraid of an accident, you can use file=open(path,mode='r',buffering=-1,encoding=None)0 Force write.

In addition, the member variables of the file object are as follows

name mode encoding error closed buffer
文件名 读写模式 编码方式 错误模式 是否已经关闭 缓冲区

In addition, there are three decision functions

CODE_TAG_REPLACE_MARK_31 CODE_TAG_REPLACE_MARK_32 CODE_TAG_REPLACE_MARK_33
是否可读 是否可写 可否指定偏移量

with … as expression

When writing to a file, if you forget open4 Or flush Then there may be 1 bit of data left in memory, resulting in a file that is incomplete.

with as Expression can be used by calling the __enter__ Methods and __exit__ Method to call more intelligently open4 So as to avoid forgetting to write open4 The trouble. Its calling method is


with open('text.txt','w') as f:
    f.write("12345")

View file.py , its __exit__ Function is exactly open4 :


def __enter__(self):
    return self

def __exit__(self, type, value, traceback):
    self.close()

Underlying implementation: os. open

open It is a very convenient function, but it also costs a lot. After all, it directly returns a file object. In contrast, its underlying implementation os.open Returns an integer file ID, which can be considered for frequent file read and write operations that require speed.

os The method to open 1 file is


fd = os.open(path, flags, mode=511, dir_fd=None)

Among them,

path Is the file path flags To open a flag, such as os.O_RDONLY Represents read-only, mode0 Represents writing only mode Indicates file permissions, such as 777 for anyone to read, write and execute; 511 means that the file creator can read and execute, while others can only read, which belongs to Linux and can be specifically mentioned in Linux in the future. mode2 Represents the rules of relative path, which is a custom function and is rarely used. Finally, the output mode3 Is the identity of a file.

Among them, the value of mode can be found in the manuals of deepin and windows. The commonly used flags are as follows, and multiple flags can be passed through mode4 Overlay, this strong C wind confirms that it comes from the operating system.

os.open open os.open open
os.O_RDONLY ‘r' os.O_WRONLY ‘w'
os.O_RDWR ‘r+' os.O_APPEND ‘a'
os.O_CREAT 创建并打开

Related functions include:

CODE_TAG_REPLACE_MARK_55 通过fd创建1个文件对象,并返回这个文件对象
CODE_TAG_REPLACE_MARK_56 从fd 中读取最多 n 个字节并返回,如果fd对应文件已达到结尾, 则返回空串。
CODE_TAG_REPLACE_MARK_57 将 CODE_TAG_REPLACE_MARK_58 写入fd,返回实际写入的字符串长度
CODE_TAG_REPLACE_MARK_59 强制将fd所对应的文件写入硬盘
CODE_TAG_REPLACE_MARK_60 关闭fd
CODE_TAG_REPLACE_MARK_61 复制fd
CODE_TAG_REPLACE_MARK_62 将fd1所对应的文件复制给fd2
CODE_TAG_REPLACE_MARK_63 返回fd的状态
CODE_TAG_REPLACE_MARK_64 裁剪fd, CODE_TAG_REPLACE_MARK_65 不大于文件尺寸
CODE_TAG_REPLACE_MARK_66 如果fd已经打开,同时与tty(-like)设备相连,则返回True, 否则False。
CODE_TAG_REPLACE_MARK_67 设置fd当前位置为pos, how为修改方式,等同于前文中的 CODE_TAG_REPLACE_MARK_20

Related articles: