How can python quickly find the difference between data in two electronic tables
- 2020-06-01 10:17:20
Recently, I just came into contact with python. I want to find some small tasks to practice, and I hope I can continuously improve my problem-solving ability in practice.
There will be such a scenario in the company: the content of a spreadsheet is used by two or three departments or more departments, and these employees will irregularly update the data of their departments in the maintenance of these spreadsheets. After a long time, the data will start to fight with each other, which is not conducive to management. How do you quickly find differences between two or more spreadsheets?
1. Excel's own method (baidu if you are interested)
2. python write a small script
#!/usr/bin/env python # -*- coding: utf-8 -*- # The import module openpyxl import openpyxl from openpyxl.styles import PatternFill from openpyxl.styles import colors from openpyxl.styles import Font, Color # read excel file # The strings in parentheses are the two you want to compare excel , note the use of" / " wb_a = openpyxl.load_workbook('d:/BAKFILE/d046532/Desktop/check excel/test.xlsx') wb_b = openpyxl.load_workbook('d:/BAKFILE/d046532/Desktop/check excel/test2.xlsx') # define 1 Method to get something in the table 1 The contents of the column, return 1 A list of # Here, in my chart: IP Is only 1 Sexual, so I use it to distinguish the difference between the data, and IP this 1 In my table is no." G "Column def getIP(wb): sheet = wb.get_active_sheet() ip =  for cellobj in sheet['G']: ip.append(cellobj.value) return ip # To obtain ip The list of ip_a = getIP(wb_a) ip_b = getIP(wb_b) # Converts two lists into a collection aa = set(ip_a) bb = set(ip_b) # Find the different rows of the two lists and convert them to lists difference = list(aa ^ bb) # Print out the elements in the list # To this 1 Step, the different data in the two tables have been found for i in difference: print (i) # Highlight different rows print (" Start the first 1 table " + "----" *10) a = wb_a.get_active_sheet()['G'] for cellobj in a: if cellobj.value in difference: print (cellobj.value) cellobj.font = Font(color=colors.BLACK, italic=True ,bold = True) cellobj.fill = PatternFill("solid", fgColor="DDDDDD") print (" Start the first 2 table " + "----" *10) b = wb_b.get_active_sheet()['G'] for cellobj in b: if cellobj.value in difference: print (cellobj.value) cellobj.font = Font(color=colors.BLACK, italic=True ,bold = True) cellobj.fill = PatternFill("solid", fgColor="DDDDDD") wb_a.save('d:/BAKFILE/d046532/Desktop/a.xlsx') wb_b.save('d:/BAKFILE/d046532/Desktop/b.xlsx')
In this way, two copies of excel are saved and the different data differences between the two tables are highlighted in this copy with the cell fill color and font color
1. How to add these different data into a table to form a complete set table
2. How to optimize and streamline code