Summary of various methods for quickly listing files in linux

  • 2020-06-15 10:57:47
  • OfStack

preface

Recently, I encountered a very difficult problem in my work. I needed to read and take out all the files in a certain directory in the ubuntu system. Because there were too many files stored in the server, the efficiency of this process was 10 points low. So how to quickly get the list of documents is the top priority of these two days, toss about a half a day to find a relatively fast method, record as follows, not to say more, to see a detailed introduction.

Multiple implementation methods

A variety of approaches have been tried, both programmatic and non-programmatic.

1, walk

python's walk library can recursively read all files in the directory, which is the most common method, but it is a little slow. The implementation is simple, and I won't go into details.

2, os scandir

In the python os.scandir The official explanation of the method is to read the directory quickly. Compared with walk, the speed has been improved under 1 test, but it still fails to meet the requirements. We also need to write recursion by ourselves, the code is as follows:


def scan_path(file_path, level = 3):
 files = []
 if level >= 0:
  path = os.scandir(file_path)
  for p in path:
   if p.is_dir():
    files.extend(scan_path(p.path, level - 1))
   else:
    files.append(p.path)
 return files

When neither approach worked, I started thinking about the non-programming approach. Theoretically speaking, the implementation efficiency of python is quite high. Although it may not reach the speed of c or c++, it is fast enough compared with java and C#. Therefore, I did not consider the way of programming, but switched to the native way of linux system.

3, ls

The ls command comes to mind, using the following command


ls  � l  � R( or -lR) src > list.txt

This command can list all the files in the src directory, but it is not efficient enough, and the result contains directory information as well as file information, which is not tidy and needs to be processed later.

4, tree

The tree command itself lists the structure tree of the file system. It also lists all directories and files with 1 set of parameters.


tree -afi -L 3 -o 2.txt --noreport src

-ES52en lists all files, -ES53en lists the full path (the result is absolute or relative path to find 1), -ES55en does not draw the structure line of tree, -ES57en lists how many layers directory, -ES58en outputs to the file, -ES59en does not use the last summary.

5, find

The find command itself is a file lookup command, but if used correctly, it can quickly list files in a directory with the following command:


find src > 1.txt

This command is fast enough to do the job. The result of find is relative to the path of the current src, that is, every result of src begins with src, if src is an absolute path, and if src is a relative path, the result starts with this relative path.

6, locate

And Google 1, found that locate and find function is similar, locate can also find files, so guess that locate can also achieve this function, try it, sure enough, the writing method is the same.


locate src > 1.txt

The difference is that whether src is a relative path or an absolute path, the result is an absolute path.

Using the time command to test the command execution time, it was found that the find and locate times are basically the same, sometimes locate is a little faster than 1, while tree is a little slower.

conclusion

Above all files are preferable to the folder, use walk and scandir most convenient integration with applications, but a bit slow, find and locate command speed faster, tree command powerful, but relatively find and locate some slow, this 3 person to be with python integration, you will need to use in program os. The mechanism of popen pipes to perform the patchwork bash command. So the above orders have their merits, according to their own needs.

conclusion


Related articles: