Python crawler tutorial of little beauty pictures crawler code sharing

  • 2020-04-02 14:01:51
  • OfStack

Continue to tamper with the crawler, today posted a code, climb the little net "beauty" under the label of the picture, the original.


# -*- coding: utf-8 -*- 

#--------------------------------------- 
#   Procedure: little beauty picture crawler  
#   Version: 0.2 
#   The author: zippera 
#   Date: 2013-07-26 
#   Language: Python 2.7 
#   Can set the number of pages to download  
#--------------------------------------- 
 
import urllib2
import urllib
import re
 
 
 
pat = re.compile('<div class="feed-big-img">n.*?imgsrc="(ht.*?)".*?')
nexturl1 = "http://www.diandian.com/tag/%E7%BE%8E%E5%A5%B3?page="
 
 
count = 1
 
while count < 2:
 
  print "Page " + str(count) + "n"
  myurl = nexturl1 + str(count)
  myres = urllib2.urlopen(myurl)
  mypage = myres.read()
  ucpage = mypage.decode("utf-8") # transcoding 
 
  mat = pat.findall(ucpage)
  
 
  
  
  
  if len(mat):
    cnt = 1
    for item in mat:
      print "Page" + str(count) + " No." + str(cnt) + " url: " + item + "n"
      cnt += 1
      fnp = re.compile('(w{10}.w+)$')
      fnr = fnp.findall(item)
      if fnr:
        fname = fnr[0]
        urllib.urlretrieve(item, fname)
    
  else:
    print "no data"
    
  count += 1

How to use it: create a new folder, save the code as name.py, and run python name.py to download images to the folder.


Related articles: