Python crawler tutorial of little beauty pictures crawler code sharing
- 2020-04-02 14:01:51
- OfStack
Continue to tamper with the crawler, today posted a code, climb the little net "beauty" under the label of the picture, the original.
# -*- coding: utf-8 -*-
#---------------------------------------
# Procedure: little beauty picture crawler
# Version: 0.2
# The author: zippera
# Date: 2013-07-26
# Language: Python 2.7
# Can set the number of pages to download
#---------------------------------------
import urllib2
import urllib
import re
pat = re.compile('<div class="feed-big-img">n.*?imgsrc="(ht.*?)".*?')
nexturl1 = "http://www.diandian.com/tag/%E7%BE%8E%E5%A5%B3?page="
count = 1
while count < 2:
print "Page " + str(count) + "n"
myurl = nexturl1 + str(count)
myres = urllib2.urlopen(myurl)
mypage = myres.read()
ucpage = mypage.decode("utf-8") # transcoding
mat = pat.findall(ucpage)
if len(mat):
cnt = 1
for item in mat:
print "Page" + str(count) + " No." + str(cnt) + " url: " + item + "n"
cnt += 1
fnp = re.compile('(w{10}.w+)$')
fnr = fnp.findall(item)
if fnr:
fname = fnr[0]
urllib.urlretrieve(item, fname)
else:
print "no data"
count += 1
How to use it: create a new folder, save the code as name.py, and run python name.py to download images to the folder.