The Method of Speeding Up When python Writes File Frequently

2021-07-03 00:26:57
OfStack

Problem background: There is a batch of files to be processed. For each file, it is necessary to call the same function for processing, which is quite time-consuming.

Is there any way to speed it up? Of course, for example, if you divide these files into several batches, and call your own python script for processing every batch, you can also speed up running several python programs at the same time.

Is there an easier way? For example, I run a program, which is divided into multiple threads at the same time, and then processed?

General idea: Divide the list of these file paths into several. As for how much it is divided, it depends on how many cpu cores you have. For example, your cpu has 32 cores, which can accelerate 32 times in theory.

The code is as follows:


# -*-coding:utf-8-*-

import numpy as np

from glob import glob

import math

import os

import torch

from tqdm import tqdm

import multiprocessing

label_path = '/home/ying/data/shiyongjie/distortion_datasets/new_distortion_dataset/train/label.txt'

file_path = '/home/ying/data/shiyongjie/distortion_datasets/new_distortion_dataset/train/distortion_image'

save_path = '/home/ying/data/shiyongjie/distortion_datasets/new_distortion_dataset/train/flow_field'

r_d_max = 128

image_index = 0

txt_file = open(label_path)

file_list = txt_file.readlines()

txt_file.close()

file_label = {}

for i in file_list:

  i = i.split()

  file_label[i[0]] = i[1]

r_d_max = 128

eps = 1e-32

H = 256

W = 256

def generate_flow_field(image_list):

  for image_file_path in ((image_list)):

    pixel_flow = np.zeros(shape=tuple([256, 256, 2])) #  According to pytorch In grid To write 

    image_file_name = os.path.basename(image_file_path)

    # print(image_file_name)

    k = float(file_label[image_file_name])*(-1)*1e-7

    # print(k)

    r_u_max = r_d_max/(1+k*r_d_max**2) #  The theoretical length of the diagonal after distortion correction is calculated 

    scale = r_u_max/128 #  Compress this length to 256 The size of, there will be 1 A scale Actually, it says here 128*sqrt(2) It may be more intuitive 

    for i_u in range(256):

      for j_u in range(256):

        x_u = float(i_u - 128)

        y_u = float(128 - j_u)

        theta = math.atan2(y_u, x_u)

        r = math.sqrt(x_u ** 2 + y_u ** 2)

        r = r * scale #  What you actually get r That is, there is no resize To 256 × 256 Image size of size And bring it into the formula 

        r_d = (1.0 - math.sqrt(1 - 4.0 * k * r ** 2)) / (2 * k * r + eps) #  Corresponding to the original image (distorted image) r

        x_d = int(round(r_d * math.cos(theta)))

        y_d = int(round(r_d * math.sin(theta)))

        i_d = int(x_d + W / 2.0)

        j_d = int(H / 2.0 - y_d)

        if i_d < W and i_d >= 0 and j_d < H and j_d >= 0: #  Only when the distortion points are in the original graph can they be assigned 

          value1 = (i_d - 128.0)/128.0

          value2 = (j_d - 128.0)/128.0

          pixel_flow[j_u, i_u, 0] = value1 # mesh Is stored in the corresponding r At the time of distortion correction, given the ratio of 1 To draw such a picture, you can find pixels 

          pixel_flow[j_u, i_u, 1] = value2

#  Save as array Format 

    saved_image_file_path = os.path.join(save_path, image_file_name.split('.')[0] + '.npy')

    pixel_flow = pixel_flow.astype('f2') #  Converts the format of the data to float16 Type,   Save space 

    # print(saved_image_file_path)

    # print(pixel_flow)

    np.save(saved_image_file_path, pixel_flow)

  return

if __name__ == '__main__':

  file_list = glob(file_path + '/*.JPEG')

  m = 32

  n = int(math.ceil(len(file_list) / float(m))) #  Rounding up 

  result = []

  pool = multiprocessing.Pool(processes=m) # 32 Process 

  for i in range(0, len(file_list), n):

    result.append(pool.apply_async(generate_flow_field, (file_list[i: i+n],)))

  pool.close()

  pool.join()

In the above code, the function

generate_flow_field(image_list)

You need to pass in 1 list, then operate on this list, and then save the result of the operation

Therefore, you only need to process a number of files, cut into as much as possible size of list, and then for every list, open a thread to process

The main function above:


if __name__ == '__main__':

  file_list = glob(file_path + '/*.JPEG') #  Will all the JPEG Files are listed as 1 A list

  m = 32 #  Hypothesis CPU Have 32 Core 

  n = int(math.ceil(len(file_list) / float(m))) #  Every 1 What the core needs to deal with list Number of 

  result = []

  pool = multiprocessing.Pool(processes=m) #  Open 32 Thread pool of threads 

  for i in range(0, len(file_list), n):

    result.append(pool.apply_async(generate_flow_field, (file_list[i: i+n],))) #  To each 1 A list Are handled with the functions we defined above 

  pool.close() #  After processing, close the thread pool 

  pool.join()

There are mainly two lines of code like this, and one line is


pool = multiprocessing.Pool(processes=m) #  Open 32 Thread pool of threads

Used to open a thread pool

The other line is


result.append(pool.apply_async(generate_flow_field, (file_list[i: i+n],))) #  To each 1 A list Are handled with the functions we defined above

For the thread pool, run the function generate_flow_field simultaneously with apply_async (), passing in the parameter: file_list [i: i+n]

In fact, the function apply_async () is used by all threads to run at the same time, and the speed is relatively fast.

Extension:

Python file processing file write mode and write cache to improve speed and efficiency

The open of Python is written in the following ways:

write (str): Write str to a file

writelines (sequence of strings): Write multiple lines to a file with iterable objects as parameters


f = open('blogCblog.txt', 'w') # First create 1 File object, opened as w
f.writelines('123456') # Use readlines() Method to write to a file

After running the above results, you can see that the blogCblog. txt file has 123456 contents. It should be noted here that mode is in 'w' mode (write mode), and then look at the following code:


f = open('blogCblog.txt', 'w') # First create 1 File object, opened as w
f.writelines(123456) # Use readlines() Method to write to a file

After running the above code, an TypeError will be reported, because the parameters passed in by writelines are not an iterable object.

The above is about python frequently write files how to speed up the relevant knowledge points and expanded content, thank you for reading.