Python's method of randomly assigning user agent to each request when collecting data using scrapy

2020-05-07 19:56:47
OfStack

This example illustrates how Python randomly assigns user-agent to each request when collecting data using scrapy. Share with you for your reference. Specific analysis is as follows:

This method allows you to request a different user-agent for each request, preventing websites from blocking scrapy spiders based on user-agent

First add the following code to the settings.py file, replacing the default user-agent processing module

DOWNLOADER_MIDDLEWARES = {

    'scraper.random_user_agent.RandomUserAgentMiddleware': 400,

      'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,

}

Custom useragent processing module

from scraper.settings import USER_AGENT_LIST

import random

from scrapy import log

class RandomUserAgentMiddleware(object):

    def process_request(self, request, spider):

        ua  = random.choice(USER_AGENT_LIST)

        if ua:

            request.headers.setdefault('User-Agent', ua)

        #log.msg('>>>> UA %s'%request.headers)

I hope this article has been helpful to your Python programming.