Test pre publish and use python to check for daily links to web pages
- 2020-04-02 13:41:02
- OfStack
With big Internet company technology comes across the basic testing, pre-release, online more than the environment, to realize the online formal testing and environmental isolation, in this case, it will inevitably encounter are crazy put links to online testing, generally this is through some tests check tool to check the link to avoid risk. I followed a problem two days ago. Also in this case, the development neglected to post the daily url online. However, there is no automatic monitoring tool in the test, so I didn't find it in time. As I happened to be watching python recently, I wanted to do a simple monitoring with python when I got home.
The general idea is to write a script in python that analyzes all the urls in a web page to see if they contain regular links, and then put the script into the crontab to run the timer task for a 10-minute check. If you find an illegal link, send a warning email to the relevant personnel. Script code about 100 lines, relatively easy to understand, paste code.
Beautifulsoup was intended for beautifulsoup, but considering the difficulty of installing the three-party library, it is still necessary to use sgmllib, which comes with sgmllib. Mail function is not implemented, according to the respective SMTP server to implement the following.
The general idea is to write a script in python that analyzes all the urls in a web page to see if they contain regular links, and then put the script into the crontab to run the timer task for a 10-minute check. If you find an illegal link, send a warning email to the relevant personnel. Script code about 100 lines, relatively easy to understand, paste code.
Beautifulsoup was intended for beautifulsoup, but considering the difficulty of installing the three-party library, it is still necessary to use sgmllib, which comes with sgmllib. Mail function is not implemented, according to the respective SMTP server to implement the following.
#!/usr/bin/env python
#coding:UTF-8
import urllib2
from sgmllib import SGMLParser
import smtplib
import time
#from email.mime.text import MIMEText
#from bs4 import BeautifulSoup
#import re
class UrlParser(SGMLParser):
urls = []
def do_a(self,attrs):
'''''parse tag a'''
for name,value in attrs:
if name=='href':
self.urls.append(value)
else:
continue
def do_link(self,attrs):
'''''parse tag link'''
for name,value in attrs:
if name=='href':
self.urls.append(value);
else:
continue
def checkUrl(checkurl, isDetail):
''''' check checkurl The corresponding web source is illegal url'''
parser = UrlParser()
page = urllib2.urlopen(checkurl)
content = page.read()
#content = unicode(content, "gb2312").encode("utf8")
parser.feed(content)
urls = parser.urls
dailyUrls = []
detailUrl = ""
for url in urls:
if 'daily' in url:
dailyUrls.append(url);
if not detailUrl and not isDetail and 'www.bc5u.com' in url:
detailUrl = url
page.close()
parser.close()
if isDetail:
return dailyUrls
else:
return dailyUrls,detailUrl
def sendMail():
''''' Send a reminder email '''
pass
def log(content):
''''' Log execution '''
logFile = 'checkdailyurl.log'
f = open(logFile,'a')
f.write(str(time.strftime("%Y-%m-%d %X",time.localtime()))+content+'n')
f.flush()
f.close()
def main():
''''' Entry method '''
# check ju
url = "www.bc5u.com"
dailyUrls,detailUrl=checkUrl(url, False)
if dailyUrls:
# Check to daily Link to send an alert message
sendMail()
log('check: find daily url')
else:
# Haven't been able to check daily Link, not processed
log('check: not find daily url')
# check judetail
dailyUrls=checkUrl(detailUrl, True)
if dailyUrls:
# Check to daily Link to send an alert message
log('check: find daily url')
sendMail()
else:
# Haven't been able to check daily Link, not processed
log('check: not find daily url')
if __name__ == '__main__':
main()