Python's method of randomly assigning user agent to each request when collecting data using scrapy

  • 2020-05-07 19:56:47
  • OfStack

This example illustrates how Python randomly assigns user-agent to each request when collecting data using scrapy. Share with you for your reference. Specific analysis is as follows:

This method allows you to request a different user-agent for each request, preventing websites from blocking scrapy spiders based on user-agent

First add the following code to the settings.py file, replacing the default user-agent processing module

DOWNLOADER_MIDDLEWARES = {
    'scraper.random_user_agent.RandomUserAgentMiddleware': 400,
      'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
}

Custom useragent processing module
from scraper.settings import USER_AGENT_LIST
import random
from scrapy import log
class RandomUserAgentMiddleware(object):
    def process_request(self, request, spider):
        ua  = random.choice(USER_AGENT_LIST)
        if ua:
            request.headers.setdefault('User-Agent', ua)
        #log.msg('>>>> UA %s'%request.headers)

I hope this article has been helpful to your Python programming.


Related articles: