Python implements instances that grab and parse web pages
- 2020-04-02 14:09:09
- OfStack
This article illustrates Python's ability to grab and parse web pages with examples. The main analysis of questions and answers with the home page of baidu. Share with you for your reference.
The main function code is as follows:
#!/usr/bin/python
#coding=utf-8
import sys
import re
import urllib2
from urllib import urlencode
from urllib import quote
import time
maxline = 2000
wenda = re.compile("href="http://wenda.so.com/q/.+?src=(.+?)"")
baidu = re.compile("<a href="http://www.baidu.com/link?url=.+".*?> Know more about the problem .*?</a>")
f1 = open("baidupage.txt","w")
f2 = open("wendapage.txt","w")
for line in sys.stdin:
if maxline == 0:
break
query = line.strip();
time.sleep(1);
recall_url = "http://www.so.com/s?&q=" + query;
response = urllib2.urlopen(recall_url);
html = response.read();
f1.write(html)
m = wenda.search(html);
if m:
if m.group(1) == "110":
print query + "twendat0";
else:
print query + "twendat1";
else:
print query + "twendat0";
recall_url = "http://www.baidu.com/s?wd=" + query +"&ie=utf-8";
response = urllib2.urlopen(recall_url);
html = response.read();
f2.write(html)
m = baidu.search(html);
if m:
print query + "tbaidut1";
else:
print query + "tbaidut0";
maxline = maxline - 1;
f1.close()
f2.close()
I hope that this article has helped you to learn Python programming.