Test pre publish and use python to check for daily links to web pages

  • 2020-04-02 13:41:02
  • OfStack

With big Internet company technology comes across the basic testing, pre-release, online more than the environment, to realize the online formal testing and environmental isolation, in this case, it will inevitably encounter are crazy put links to online testing, generally this is through some tests check tool to check the link to avoid risk. I followed a problem two days ago. Also in this case, the development neglected to post the daily url online. However, there is no automatic monitoring tool in the test, so I didn't find it in time. As I happened to be watching python recently, I wanted to do a simple monitoring with python when I got home.

The general idea is to write a script in python that analyzes all the urls in a web page to see if they contain regular links, and then put the script into the crontab to run the timer task for a 10-minute check. If you find an illegal link, send a warning email to the relevant personnel. Script code about 100 lines, relatively easy to understand, paste code.

Beautifulsoup was intended for beautifulsoup, but considering the difficulty of installing the three-party library, it is still necessary to use sgmllib, which comes with sgmllib. Mail function is not implemented, according to the respective SMTP server to implement the following.
 
#!/usr/bin/env python 
#coding:UTF-8 

import urllib2 
from sgmllib import SGMLParser 
import smtplib 
import time 
#from email.mime.text import MIMEText 
#from bs4 import BeautifulSoup 
#import re 

class UrlParser(SGMLParser): 
urls = [] 
def do_a(self,attrs): 
'''''parse tag a''' 
for name,value in attrs: 
if name=='href': 
self.urls.append(value) 
else: 
continue 

def do_link(self,attrs): 
'''''parse tag link''' 
for name,value in attrs: 
if name=='href': 
self.urls.append(value); 
else: 
continue 

def checkUrl(checkurl, isDetail): 
''''' check checkurl The corresponding web source is illegal url''' 
parser = UrlParser() 
page = urllib2.urlopen(checkurl) 
content = page.read() 
#content = unicode(content, "gb2312").encode("utf8") 
parser.feed(content) 
urls = parser.urls 

dailyUrls = [] 
detailUrl = "" 
for url in urls: 
if 'daily' in url: 
dailyUrls.append(url); 
if not detailUrl and not isDetail and 'www.bc5u.com' in url: 
detailUrl = url 

page.close() 
parser.close() 

if isDetail: 
return dailyUrls 
else: 
return dailyUrls,detailUrl 

def sendMail(): 
''''' Send a reminder email ''' 
pass 

def log(content): 
''''' Log execution ''' 
logFile = 'checkdailyurl.log' 
f = open(logFile,'a') 
f.write(str(time.strftime("%Y-%m-%d %X",time.localtime()))+content+'n') 
f.flush() 
f.close() 

def main(): 
''''' Entry method ''' 
# check ju 
url = "www.bc5u.com" 

dailyUrls,detailUrl=checkUrl(url, False) 
if dailyUrls: 
# Check to daily Link to send an alert message  
sendMail() 
log('check: find daily url') 
else: 
# Haven't been able to check daily Link, not processed  
log('check: not find daily url') 

# check judetail 
dailyUrls=checkUrl(detailUrl, True) 
if dailyUrls: 
# Check to daily Link to send an alert message  
log('check: find daily url') 
sendMail() 
else: 
# Haven't been able to check daily Link, not processed  
log('check: not find daily url') 

if __name__ == '__main__': 
main() 

Related articles: