Python handles Excel file instance code

  • 2020-06-07 04:41:47
  • OfStack

Due to work requirements, it is necessary to verify the validity of part 1 query content, which is stored in Excel, text content is Title of the page, and URL of the page is associated with each Cell in HyperLink format.

So the instinct is to use Python to read the Excel file and then do text analysis, and then do HttpRequest once for each link to determine whether the current link is valid or not by analyzing the contents of HttpResponse.

So I searched on the Internet and found that the most popular plug-in was xlrd. However, in the actual use, I found that the value 1 of hyperlink_map was always None, so I didn't have time to analyze why. Finally, after searching, I found an Python library called xlwings, which can be used effectively.

xlwings: Python For Excel

The specific code is as follows:


# -*- coding=utf-8 -*-
import xlwings as xw
import urllib
import sys

type = sys.getfilesystemencoding() 

def get_html(url):
  page = urllib.urlopen(url)
  html = page.read()
  return unzip(html)

## Debug Find out no matter what to do Decode ", the final result is all messy code 
##  Later, it was found that the corresponding web page was compressed, so the content of the obtained web page needs to be extracted manually 
def unzip(data):
  import gzip
  import StringIO
  data = StringIO.StringIO(data)
  gz = gzip.GzipFile(fileobj=data)
  data = gz.read()
  gz.close()
  return data

wb = xw.Book(r"C:\Users\hasee\Desktop\Test.xlsx")
main_data = wb.sheets["Sheet2"]
##  By getting Last Cell To determine the current Sheet The number of valid rows and columns 
rownum = main_data.range('A1').current_region.last_cell.row
colnum = main_data.range('A1').current_region.last_cell.column

##  positioning column The corresponding column 
col_dict = {"2":"B","3":"C","4":"D","5":"E","6":"F"}

for row in range(1, rownum + 1):
  for col in range(2, colnum + 1):
    query = main_data.range(row, 1).value
    cell = main_data.range(row, col)
    link = cell.hyperlink
    html = get_html(link)
    if "error-container" in html:
      print "%s,%s,%s,%s" % (query, col_dict.get(str(col))+str(row), cell.value, cell.hyperlink)
      ##  To which an invalid link belongs Cell Dye and write directly to the file 
      cell.color = (253,218,4)

Related articles: