Use Python for directory comparison methods

  • 2021-01-19 22:18:09
  • OfStack

If you are comparing individual files, you can use the difflib module. While the filecmp module can also compare individual files, the former provides a better looking report. If we just want to see if a file in both directories is identical, regardless of its contents, then the latter might be the alternative.

If you compare directories, you can use the filecmp module.

A simple command interaction demonstration in IPython is as follows:


In [1]: import filecmp

In [2]: ls

Chapter_01/ Chapter_02/

In [3]: dirobj = filecmp.dircmp('Chapter_01','Chapter_02')

Output two directory difference reports


In [4]: dirobj.report()

diff Chapter_01 Chapter_02

Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']

Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']

Identical files : ['c01.py']

The output section reports (the outermost directory used to be the same as the first one because there are no subdirectories)


In [5]: dirobj.report_partial_closure()

diff Chapter_01 Chapter_02

Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']

Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']

Identical files : ['c01.py']

Output all reports


In [6]: dirobj.report_full_closure()

diff Chapter_01 Chapter_02

Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']

Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']

Identical files : ['c01.py']

Take a look at the format of the output object. This format is a little odd


In [7]: type(dirobj.report_full_closure())

diff Chapter_01 Chapter_02

Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']

Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']

Identical files : ['c01.py']

Out[7]: NoneType

Attempts to convert the result object to a string


In [8]: str(dirobj.report_full_closure())

diff Chapter_01 Chapter_02

Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']

Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']

Identical files : ['c01.py']

Out[8]: 'None'

Outputs the list of directories on the left


In [9]: dirobj.left_list

Out[9]: ['c01.py', 'ip.py', 'os_mem.py', 'pid.py']

List of directories on the right


In [10]: dirobj.right_list

Out[10]: 

['c01.py',

 'd1.py',

 'd2.py',

 'diff.py',

 'diff.zip',

 'dns_parser.py',

 'join.py',

 'pydiff.py',

 'report.html']

A list of files that exist only in the right directory


In [11]: dirobj.right_only

Out[11]: 

['d1.py',

 'd2.py',

 'diff.py',

 'diff.zip',

 'dns_parser.py',

 'join.py',

 'pydiff.py',

 'report.html']

Common subdirectories


In [12]: dirobj.common_dirs

Out[12]: []

Common file


In [4]: dirobj.report()

diff Chapter_01 Chapter_02

Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']

Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']

Identical files : ['c01.py']
0

Uncomparable directories


In [4]: dirobj.report()

diff Chapter_01 Chapter_02

Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']

Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']

Identical files : ['c01.py']
1

Same file


In [4]: dirobj.report()

diff Chapter_01 Chapter_02

Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']

Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']

Identical files : ['c01.py']
2

Incomparable files


In [16]: dirobj.funny_files

Out[16]: []

One of the commands I often use in MATLAB, visdiff, is for comparing files or directories. By comparison, Python basically provides the same functionality as MATLAB's corresponding commands. However, MATLAB is simpler to use and seems to be a bit more detailed. However, it is worth considering using Python. Firstly, the Python is free, and secondly, the startup speed is probably much faster than the MATLAB. In addition, now Python code is quite simple, in their own needs when a few lines of code can be pieced together to achieve their own functionality.


Related articles: