Use Python for directory comparison methods
- 2021-01-19 22:18:09
- OfStack
If you are comparing individual files, you can use the difflib module. While the filecmp module can also compare individual files, the former provides a better looking report. If we just want to see if a file in both directories is identical, regardless of its contents, then the latter might be the alternative.
If you compare directories, you can use the filecmp module.
A simple command interaction demonstration in IPython is as follows:
In [1]: import filecmp
In [2]: ls
Chapter_01/ Chapter_02/
In [3]: dirobj = filecmp.dircmp('Chapter_01','Chapter_02')
Output two directory difference reports
In [4]: dirobj.report()
diff Chapter_01 Chapter_02
Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']
Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']
Identical files : ['c01.py']
The output section reports (the outermost directory used to be the same as the first one because there are no subdirectories)
In [5]: dirobj.report_partial_closure()
diff Chapter_01 Chapter_02
Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']
Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']
Identical files : ['c01.py']
Output all reports
In [6]: dirobj.report_full_closure()
diff Chapter_01 Chapter_02
Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']
Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']
Identical files : ['c01.py']
Take a look at the format of the output object. This format is a little odd
In [7]: type(dirobj.report_full_closure())
diff Chapter_01 Chapter_02
Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']
Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']
Identical files : ['c01.py']
Out[7]: NoneType
Attempts to convert the result object to a string
In [8]: str(dirobj.report_full_closure())
diff Chapter_01 Chapter_02
Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']
Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']
Identical files : ['c01.py']
Out[8]: 'None'
Outputs the list of directories on the left
In [9]: dirobj.left_list
Out[9]: ['c01.py', 'ip.py', 'os_mem.py', 'pid.py']
List of directories on the right
In [10]: dirobj.right_list
Out[10]:
['c01.py',
'd1.py',
'd2.py',
'diff.py',
'diff.zip',
'dns_parser.py',
'join.py',
'pydiff.py',
'report.html']
A list of files that exist only in the right directory
In [11]: dirobj.right_only
Out[11]:
['d1.py',
'd2.py',
'diff.py',
'diff.zip',
'dns_parser.py',
'join.py',
'pydiff.py',
'report.html']
Common subdirectories
In [12]: dirobj.common_dirs
Out[12]: []
Common file
In [4]: dirobj.report()
diff Chapter_01 Chapter_02
Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']
Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']
Identical files : ['c01.py']
0
Uncomparable directories
In [4]: dirobj.report()
diff Chapter_01 Chapter_02
Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']
Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']
Identical files : ['c01.py']
1
Same file
In [4]: dirobj.report()
diff Chapter_01 Chapter_02
Only in Chapter_01 : ['ip.py', 'os_mem.py', 'pid.py']
Only in Chapter_02 : ['d1.py', 'd2.py', 'diff.py', 'diff.zip', 'dns_parser.py', 'join.py', 'pydiff.py', 'report.html']
Identical files : ['c01.py']
2
Incomparable files
In [16]: dirobj.funny_files
Out[16]: []
One of the commands I often use in MATLAB, visdiff, is for comparing files or directories. By comparison, Python basically provides the same functionality as MATLAB's corresponding commands. However, MATLAB is simpler to use and seems to be a bit more detailed. However, it is worth considering using Python. Firstly, the Python is free, and secondly, the startup speed is probably much faster than the MATLAB. In addition, now Python code is quite simple, in their own needs when a few lines of code can be pieced together to achieve their own functionality.