Python implements instances that grab and parse web pages

  • 2020-04-02 14:09:09
  • OfStack

This article illustrates Python's ability to grab and parse web pages with examples. The main analysis of questions and answers with the home page of baidu. Share with you for your reference.

The main function code is as follows:


#!/usr/bin/python
#coding=utf-8

import sys 
import re
import urllib2
from urllib import urlencode
from urllib import quote
import time
maxline = 2000

wenda = re.compile("href="http://wenda.so.com/q/.+?src=(.+?)"")
baidu = re.compile("<a href="http://www.baidu.com/link?url=.+".*?> Know more about the problem .*?</a>")
f1 = open("baidupage.txt","w")
f2 = open("wendapage.txt","w")

for line in sys.stdin:
  if maxline == 0:
    break
  query = line.strip();
  time.sleep(1);
  recall_url = "http://www.so.com/s?&q=" + query;
  response = urllib2.urlopen(recall_url);
  html = response.read();                                                   
  f1.write(html)
  m = wenda.search(html);
  if m:
    if m.group(1) == "110":
      print query + "twendat0";
    else:
      print query + "twendat1";
  else:
    print query + "twendat0";
  recall_url = "http://www.baidu.com/s?wd=" + query +"&ie=utf-8";
  response = urllib2.urlopen(recall_url);
  html = response.read();
  f2.write(html)
  m = baidu.search(html);
  if m:
    print query + "tbaidut1";
  else:
    print query + "tbaidut0";
  maxline = maxline - 1;
f1.close()
f2.close()

I hope that this article has helped you to learn Python programming.


Related articles: