Python I and O tips for efficient handling

  • 2020-05-26 09:23:41
  • OfStack

How to read and write text files?

The actual case

If a text file is encoded in a straight format (UTF-8,GBK,BIG5), how do you read these files in python2.x and python3.x, respectively?

The solution

The semantics of the string have changed:

python2 python3
str bytes
unicode str

python2.x encodes unicode before writing to the file, and decodes the binary string after reading in the file


>>> f = open('py2.txt', 'w')
>>> s = u' hello '
>>> f.write(s.encode('gbk'))
>>> f.close()
>>> f = open('py2.txt', 'r')
>>> t = f.read()
>>> print t.decode('gbk')
 hello 

The open function in python3.x specifies the text mode of t, and encoding specifies the encoding format


>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '

How do I set the file buffer

The actual case

When writing the contents of the file to the hard disk device, the system call is used. This kind of I/O operation takes a long time. In order to reduce the number of I/O operations, the file usually USES buffer (there is enough data to make the system call).

How do I set the buffered line for a file object in Python?

The solution

Full buffering: the buffering of the open function is set to the integer n greater than 1, and n is the buffer size


>>> f = open('demo2.txt', 'w', buffering=2048)
>>> f.write('+' * 1024)
>>> f.write('+' * 1023)
#  Is greater than 2048 Is written to the file 
>>> f.write('-' * 2)
>>> f.close()

Line buffering: the buffering of the open function is set to 1


>>> f = open('demo3.txt', 'w', buffering=1)
>>> f.write('abcd')
>>> f.write('1234')
#  As long as add \n Write to the file 
>>> f.write('\n')
>>> f.close()

Unbuffered: buffering of the open function is set to 0


>>> f = open('demo4.txt', 'w', buffering=0)
>>> f.write('a')
>>> f.write('b')
>>> f.close()

How do I map files to memory?

The actual case

When accessing some binary files, you want to be able to map the files into memory, which can be accessed randomly.
For some embedded devices, the registers are addressable to the memory address space, and we can map /dev/mem somewhere to access those registers
If multiple processes are mapped to the same file, the purpose of process communication can also be achieved

The solution

Using the mmap module from the standard library mmap() Function, which takes an open file descriptor as an argument

Create the following file


[root@iZ28i253je0Z ~]# dd if=/dev/zero of=demo.bin bs=1024 count=1024
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.00380084 s, 276 MB/s
#  In order to 106 View file contents in base format 
[root@iZ28i253je0Z ~]# od -x demo.bin 
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
4000000

>>> import mmap
>>> import os
>>> f = open('demo.bin','r+b')
#  Gets the file descriptor 
>>> f.fileno()
3
>>> m = mmap.mmap(f.fileno(),0,access=mmap.ACCESS_WRITE)
>>> type(m)
<type 'mmap.mmap'>
#  Content can be retrieved through an index 
>>> m[0]
'\x00'
>>> m[10:20]
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
#  Modify the content 
>>> m[0] = '\x88'

To view


[root@iZ28i253je0Z ~]# od -x demo.bin 
0000000 0088 0000 0000 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
4000000

Modify the section


>>> m[4:8] = '\xff' * 4

To view


[root@iZ28i253je0Z ~]# od -x demo.bin 
0000000 0088 0000 ffff ffff 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
4000000
>>> m = mmap.mmap(f.fileno(),mmap.PAGESIZE * 8,access=mmap.ACCESS_WRITE,offset=mmap.PAGESIZE * 4) 
>>> m[:0x1000] = '\xaa' * 0x1000

To view


>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
0

How do I access the state of a file?

The actual case

In some projects, we need to get file status, for example:

Type of file (normal file, directory, symbolic link, device file...) File access rights Last access/modification/node state change time of file Size of normal file ... .

The solution

The current directory has the following files


>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
1

The system calls

Three system calls under the os module in the standard library, stat, fstat and lstat, get the file status


>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
2

Gets access to a file, as long as it is greater than 0


>>> s.st_mode & stat.S_IRUSR
256
>>> s.st_mode & stat.S_IXGRP
0
>>> s.st_mode & stat.S_IXOTH
0

Gets the modification time of the file


>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
4

Converts the obtained timestamp


>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
5

Gets the size of a normal file


>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
6

Quick function

The standard library os.path under some functions, more concise to use

File type determination


>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
7

File 3 times


>>> os.path.getatime('files')
1473996947.3384445
>>> os.path.getmtime('files')
1473996947.3384445
>>> os.path.getctime('files')
1473996947.3384445

Get file size


>>> f = open('py3.txt', 'wt', encoding='utf-8')
>>> f.write(' hello ')
2
>>> f.close()
>>> f = open('py3.txt', 'rt', encoding='utf-8')
>>> s = f.read()
>>> s
' hello '
9

How do I use temporary files?

The actual case

In a project, we collected data from the sensor. After collecting 1G data, we conducted data analysis and only saved the analysis results. Such a large amount of temporary data will consume a lot of memory resources if it is resident in memory, so we can use temporary files to store these temporary data (external storage).

Temporary files are unnamed and are automatically deleted when closed

The solution

Use TemporaryFile under tempfile in the standard library, NamedTemporaryFile


>>> from tempfile import TemporaryFile, NamedTemporaryFile
#  It can only be accessed through objects f To visit 
>>> f = TemporaryFile()
>>> f.write('abcdef' * 100000)
#  Access temporary data 
>>> f.seek(0)
>>> f.read(100)
'abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcd'

>>> ntf = NamedTemporaryFile()
#  If you want to make it every time you create it NamedTemporaryFile() Object can be set without deleting files NamedTemporaryFile(delete=False)
>>> ntf.name
#  Returns the path of the current temporary file in the file system 
'/tmp/tmppNvBu2'

conclusion

The above is all about I/O efficient processing skills in Python, I hope the content of this article can bring you a definite help in your study or work, if you have any questions, you can leave a message to communicate.


Related articles: