The Method of Speeding Up When python Writes File Frequently
- 2021-07-03 00:26:57
- OfStack
Problem background: There is a batch of files to be processed. For each file, it is necessary to call the same function for processing, which is quite time-consuming.
Is there any way to speed it up? Of course, for example, if you divide these files into several batches, and call your own python script for processing every batch, you can also speed up running several python programs at the same time.
Is there an easier way? For example, I run a program, which is divided into multiple threads at the same time, and then processed?
General idea: Divide the list of these file paths into several. As for how much it is divided, it depends on how many cpu cores you have. For example, your cpu has 32 cores, which can accelerate 32 times in theory.
The code is as follows:
# -*-coding:utf-8-*-
import numpy as np
from glob import glob
import math
import os
import torch
from tqdm import tqdm
import multiprocessing
label_path = '/home/ying/data/shiyongjie/distortion_datasets/new_distortion_dataset/train/label.txt'
file_path = '/home/ying/data/shiyongjie/distortion_datasets/new_distortion_dataset/train/distortion_image'
save_path = '/home/ying/data/shiyongjie/distortion_datasets/new_distortion_dataset/train/flow_field'
r_d_max = 128
image_index = 0
txt_file = open(label_path)
file_list = txt_file.readlines()
txt_file.close()
file_label = {}
for i in file_list:
i = i.split()
file_label[i[0]] = i[1]
r_d_max = 128
eps = 1e-32
H = 256
W = 256
def generate_flow_field(image_list):
for image_file_path in ((image_list)):
pixel_flow = np.zeros(shape=tuple([256, 256, 2])) # According to pytorch In grid To write
image_file_name = os.path.basename(image_file_path)
# print(image_file_name)
k = float(file_label[image_file_name])*(-1)*1e-7
# print(k)
r_u_max = r_d_max/(1+k*r_d_max**2) # The theoretical length of the diagonal after distortion correction is calculated
scale = r_u_max/128 # Compress this length to 256 The size of, there will be 1 A scale Actually, it says here 128*sqrt(2) It may be more intuitive
for i_u in range(256):
for j_u in range(256):
x_u = float(i_u - 128)
y_u = float(128 - j_u)
theta = math.atan2(y_u, x_u)
r = math.sqrt(x_u ** 2 + y_u ** 2)
r = r * scale # What you actually get r That is, there is no resize To 256 × 256 Image size of size And bring it into the formula
r_d = (1.0 - math.sqrt(1 - 4.0 * k * r ** 2)) / (2 * k * r + eps) # Corresponding to the original image (distorted image) r
x_d = int(round(r_d * math.cos(theta)))
y_d = int(round(r_d * math.sin(theta)))
i_d = int(x_d + W / 2.0)
j_d = int(H / 2.0 - y_d)
if i_d < W and i_d >= 0 and j_d < H and j_d >= 0: # Only when the distortion points are in the original graph can they be assigned
value1 = (i_d - 128.0)/128.0
value2 = (j_d - 128.0)/128.0
pixel_flow[j_u, i_u, 0] = value1 # mesh Is stored in the corresponding r At the time of distortion correction, given the ratio of 1 To draw such a picture, you can find pixels
pixel_flow[j_u, i_u, 1] = value2
# Save as array Format
saved_image_file_path = os.path.join(save_path, image_file_name.split('.')[0] + '.npy')
pixel_flow = pixel_flow.astype('f2') # Converts the format of the data to float16 Type, Save space
# print(saved_image_file_path)
# print(pixel_flow)
np.save(saved_image_file_path, pixel_flow)
return
if __name__ == '__main__':
file_list = glob(file_path + '/*.JPEG')
m = 32
n = int(math.ceil(len(file_list) / float(m))) # Rounding up
result = []
pool = multiprocessing.Pool(processes=m) # 32 Process
for i in range(0, len(file_list), n):
result.append(pool.apply_async(generate_flow_field, (file_list[i: i+n],)))
pool.close()
pool.join()
In the above code, the function
generate_flow_field(image_list)
You need to pass in 1 list, then operate on this list, and then save the result of the operation
Therefore, you only need to process a number of files, cut into as much as possible size of list, and then for every list, open a thread to process
The main function above:
if __name__ == '__main__':
file_list = glob(file_path + '/*.JPEG') # Will all the JPEG Files are listed as 1 A list
m = 32 # Hypothesis CPU Have 32 Core
n = int(math.ceil(len(file_list) / float(m))) # Every 1 What the core needs to deal with list Number of
result = []
pool = multiprocessing.Pool(processes=m) # Open 32 Thread pool of threads
for i in range(0, len(file_list), n):
result.append(pool.apply_async(generate_flow_field, (file_list[i: i+n],))) # To each 1 A list Are handled with the functions we defined above
pool.close() # After processing, close the thread pool
pool.join()
There are mainly two lines of code like this, and one line is
pool = multiprocessing.Pool(processes=m) # Open 32 Thread pool of threads
Used to open a thread pool
The other line is
result.append(pool.apply_async(generate_flow_field, (file_list[i: i+n],))) # To each 1 A list Are handled with the functions we defined above
For the thread pool, run the function generate_flow_field simultaneously with apply_async (), passing in the parameter: file_list [i: i+n]
In fact, the function apply_async () is used by all threads to run at the same time, and the speed is relatively fast.
Extension:
Python file processing file write mode and write cache to improve speed and efficiency
The open of Python is written in the following ways:
write (str): Write str to a file
writelines (sequence of strings): Write multiple lines to a file with iterable objects as parameters
f = open('blogCblog.txt', 'w') # First create 1 File object, opened as w
f.writelines('123456') # Use readlines() Method to write to a file
After running the above results, you can see that the blogCblog. txt file has 123456 contents. It should be noted here that mode is in 'w' mode (write mode), and then look at the following code:
f = open('blogCblog.txt', 'w') # First create 1 File object, opened as w
f.writelines(123456) # Use readlines() Method to write to a file
After running the above code, an TypeError will be reported, because the parameters passed in by writelines are not an iterable object.
The above is about python frequently write files how to speed up the relevant knowledge points and expanded content, thank you for reading.