Four different ways to read files using Python

  • 2020-06-01 10:14:17
  • OfStack

preface

Everyone knows that Python can read files in a variety of ways, but when it comes to reading a large file, different ways can have different effects. Let's take a look at the details below.

scenario

Read 1 large 2.9G file line by line

CPU i7 6820HQ RAM 32G

methods

Split the string once for each line read

The following methods all use with... The as method opens the file.

The with statement is appropriate for accessing resources, ensuring that necessary "cleanup" operations are performed regardless of whether exceptions occur during use, freeing resources such as automatic closing of files after use, automatic acquisition and release of locks in threads, and so on.

Method 1 the most common way to read a file


with open(file, 'r') as fh:
 for line in fh.readlines():
 line.split("|")

Results: time: 15.4346568584 seconds

The system monitor shows that memory has soared from 4.8G 1 to 8.4G, and fh.readlines () stores all the rows it reads into memory, a method suitable for small files.

Method 2


with open(file, 'r') as fh:
 line = fh.readline()
 while line:
 line.split("|")

Running result: 22.3531990051 seconds

There is little change in memory, because only 1 row of data is accessed in memory, but the time is obviously longer than the previous one, which is not efficient for further data processing.

Methods 3


with open(file) as fh:
 for line in fh:
 line.split("|")

Results of operation: 13.9956979752 seconds

Memory is almost unchanged and faster than method 2.

for line in fh treats the file object fh as iterable, it automatically USES the cached IO and memory management, so you don't have to worry about large files. This is very pythonic way!

Method 4 fileinput module


for line in fileinput.input(file):
 line.split("|")

Running result: the time was 26.1103110313 seconds

Memory has been increased by 200-300 MB, which is the slowest of the above.

conclusion

The above methods are for reference only, the recognized large file reading method or 3 best. But it depends on the performance of the machine and the complexity of processing the data.


Related articles: