Python I and O tips for efficient handling
- 2020-05-26 09:23:41
- OfStack
How to read and write text files?
The actual case
If a text file is encoded in a straight format (UTF-8,GBK,BIG5), how do you read these files in python2.x and python3.x, respectively?
The solution
The semantics of the string have changed:
python2 | python3 |
---|---|
str | bytes |
unicode | str |
python2.x encodes unicode before writing to the file, and decodes the binary string after reading in the file
>>> f = open('py2.txt', 'w')
>>> s = u' hello '
>>> f.write(s.encode('gbk'))
>>> f.close()
>>> f = open('py2.txt', 'r')
>>> t = f.read()
>>> print t.decode('gbk')
hello
The open function in python3.x specifies the text mode of t, and encoding specifies the encoding format
>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
How do I set the file buffer
The actual case
When writing the contents of the file to the hard disk device, the system call is used. This kind of I/O operation takes a long time. In order to reduce the number of I/O operations, the file usually USES buffer (there is enough data to make the system call).
How do I set the buffered line for a file object in Python?
The solution
Full buffering: the buffering of the open function is set to the integer n greater than 1, and n is the buffer size
>>> f = open('demo2.txt', 'w', buffering=2048)
>>> f.write('+' * 1024)
>>> f.write('+' * 1023)
# Is greater than 2048 Is written to the file
>>> f.write('-' * 2)
>>> f.close()
Line buffering: the buffering of the open function is set to 1
>>> f = open('demo3.txt', 'w', buffering=1)
>>> f.write('abcd')
>>> f.write('1234')
# As long as add \n Write to the file
>>> f.write('\n')
>>> f.close()
Unbuffered: buffering of the open function is set to 0
>>> f = open('demo4.txt', 'w', buffering=0)
>>> f.write('a')
>>> f.write('b')
>>> f.close()
How do I map files to memory?
The actual case
For some embedded devices, the registers are addressable to the memory address space, and we can map /dev/mem somewhere to access those registers
If multiple processes are mapped to the same file, the purpose of process communication can also be achieved
The solution
Using the mmap module from the standard library
mmap()
Function, which takes an open file descriptor as an argument
Create the following file
[root@iZ28i253je0Z ~]# dd if=/dev/zero of=demo.bin bs=1024 count=1024
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.00380084 s, 276 MB/s
# In order to 106 View file contents in base format
[root@iZ28i253je0Z ~]# od -x demo.bin
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
4000000
>>> import mmap
>>> import os
>>> f = open('demo.bin','r+b')
# Gets the file descriptor
>>> f.fileno()
3
>>> m = mmap.mmap(f.fileno(),0,access=mmap.ACCESS_WRITE)
>>> type(m)
<type 'mmap.mmap'>
# Content can be retrieved through an index
>>> m[0]
'\x00'
>>> m[10:20]
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
# Modify the content
>>> m[0] = '\x88'
To view
[root@iZ28i253je0Z ~]# od -x demo.bin
0000000 0088 0000 0000 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
4000000
Modify the section
>>> m[4:8] = '\xff' * 4
To view
[root@iZ28i253je0Z ~]# od -x demo.bin
0000000 0088 0000 ffff ffff 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
4000000
>>> m = mmap.mmap(f.fileno(),mmap.PAGESIZE * 8,access=mmap.ACCESS_WRITE,offset=mmap.PAGESIZE * 4)
>>> m[:0x1000] = '\xaa' * 0x1000
To view
>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
0
How do I access the state of a file?
The actual case
In some projects, we need to get file status, for example:
Type of file (normal file, directory, symbolic link, device file...) File access rights Last access/modification/node state change time of file Size of normal file ... .
The solution
The current directory has the following files
>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
1
The system calls
Three system calls under the os module in the standard library, stat, fstat and lstat, get the file status
>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
2
Gets access to a file, as long as it is greater than 0
>>> s.st_mode & stat.S_IRUSR
256
>>> s.st_mode & stat.S_IXGRP
0
>>> s.st_mode & stat.S_IXOTH
0
Gets the modification time of the file
>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
4
Converts the obtained timestamp
>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
5
Gets the size of a normal file
>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
6
Quick function
The standard library os.path under some functions, more concise to use
File type determination
>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
7
File 3 times
>>> os.path.getatime('files')
1473996947.3384445
>>> os.path.getmtime('files')
1473996947.3384445
>>> os.path.getctime('files')
1473996947.3384445
Get file size
>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
9
How do I use temporary files?
The actual case
In a project, we collected data from the sensor. After collecting 1G data, we conducted data analysis and only saved the analysis results. Such a large amount of temporary data will consume a lot of memory resources if it is resident in memory, so we can use temporary files to store these temporary data (external storage).
Temporary files are unnamed and are automatically deleted when closed
The solution
Use TemporaryFile under tempfile in the standard library, NamedTemporaryFile
>>> from tempfile import TemporaryFile, NamedTemporaryFile
# It can only be accessed through objects f To visit
>>> f = TemporaryFile()
>>> f.write('abcdef' * 100000)
# Access temporary data
>>> f.seek(0)
>>> f.read(100)
'abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcd'
>>> ntf = NamedTemporaryFile()
# If you want to make it every time you create it NamedTemporaryFile() Object can be set without deleting files NamedTemporaryFile(delete=False)
>>> ntf.name
# Returns the path of the current temporary file in the file system
'/tmp/tmppNvBu2'
conclusion
The above is all about I/O efficient processing skills in Python, I hope the content of this article can bring you a definite help in your study or work, if you have any questions, you can leave a message to communicate.