Python print scrapy spider grab tree structure method
- 2020-05-07 19:57:17
- OfStack
This article illustrates the method of Python printing scrapy spider grasping tree structure. Share with you for your reference. The details are as follows:
The following code gives you a clear idea of the structure of scrapy's crawling page and makes the call very simple
#!/usr/bin/env python
import fileinput, re
from collections import defaultdict
def print_urls(allurls, referer, indent=0):
urls = allurls[referer]
for url in urls:
print ' '*indent + referer
if url in allurls:
print_urls(allurls, url, indent+2)
def main():
log_re = re.compile(r'<GET (.*?)> \(referer: (.*?)\)')
allurls = defaultdict(list)
for l in fileinput.input():
m = log_re.search(l)
if m:
url, ref = m.groups()
allurls[ref] += [url]
print_urls(allurls, 'None')
main()
I hope this article has been helpful to your Python programming.