How can python quickly find the difference between data in two electronic tables

  • 2020-06-01 10:17:20
  • OfStack

Recently, I just came into contact with python. I want to find some small tasks to practice, and I hope I can continuously improve my problem-solving ability in practice.

There will be such a scenario in the company: the content of a spreadsheet is used by two or three departments or more departments, and these employees will irregularly update the data of their departments in the maintenance of these spreadsheets. After a long time, the data will start to fight with each other, which is not conducive to management. How do you quickly find differences between two or more spreadsheets?

Solutions:

1. Excel's own method (baidu if you are interested)

2. python write a small script


#!/usr/bin/env python
# -*- coding: utf-8 -*-

# The import module  openpyxl 
import openpyxl
from openpyxl.styles import PatternFill
from openpyxl.styles import colors
from openpyxl.styles import Font, Color

# read excel file 
# The strings in parentheses are the two you want to compare excel , note the use of" / " 
wb_a = openpyxl.load_workbook('d:/BAKFILE/d046532/Desktop/check excel/test.xlsx')
wb_b = openpyxl.load_workbook('d:/BAKFILE/d046532/Desktop/check excel/test2.xlsx')
# define 1 Method to get something in the table 1 The contents of the column, return 1 A list of 
# Here, in my chart: IP Is only 1 Sexual, so I use it to distinguish the difference between the data, and IP this 1 In my table is no." G "Column 
def getIP(wb):
  sheet = wb.get_active_sheet()
  ip = []
  for cellobj in sheet['G']:
    ip.append(cellobj.value)

  return ip
# To obtain ip The list of 
ip_a = getIP(wb_a)
ip_b = getIP(wb_b)
# Converts two lists into a collection 
aa = set(ip_a)
bb = set(ip_b)
# Find the different rows of the two lists and convert them to lists 
difference = list(aa ^ bb)
# Print out the elements in the list 
# To this 1 Step, the different data in the two tables have been found 
for i in difference:
  print (i)

# Highlight different rows 
print (" Start the first 1 table " + "----" *10)
a = wb_a.get_active_sheet()['G']
for cellobj in a:
  if cellobj.value in difference:
    print (cellobj.value)
    cellobj.font = Font(color=colors.BLACK, italic=True ,bold = True)
    cellobj.fill = PatternFill("solid", fgColor="DDDDDD")
print (" Start the first 2 table " + "----" *10)
b = wb_b.get_active_sheet()['G']
for cellobj in b:
  if cellobj.value in difference:
    print (cellobj.value)
    cellobj.font = Font(color=colors.BLACK, italic=True ,bold = True)
    cellobj.fill = PatternFill("solid", fgColor="DDDDDD")

wb_a.save('d:/BAKFILE/d046532/Desktop/a.xlsx')
wb_b.save('d:/BAKFILE/d046532/Desktop/b.xlsx')

In this way, two copies of excel are saved and the different data differences between the two tables are highlighted in this copy with the cell fill color and font color

Unresolved:

1. How to add these different data into a table to form a complete set table

2. How to optimize and streamline code


Related articles: