Python's method of randomly assigning user agent to each request when collecting data using scrapy
- 2020-05-07 19:56:47
- OfStack
This example illustrates how Python randomly assigns user-agent to each request when collecting data using scrapy. Share with you for your reference. Specific analysis is as follows:
This method allows you to request a different user-agent for each request, preventing websites from blocking scrapy spiders based on user-agent
First add the following code to the settings.py file, replacing the default user-agent processing module
DOWNLOADER_MIDDLEWARES = {
'scraper.random_user_agent.RandomUserAgentMiddleware': 400,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
}
Custom useragent processing module
from scraper.settings import USER_AGENT_LIST
import random
from scrapy import log
class RandomUserAgentMiddleware(object):
def process_request(self, request, spider):
ua = random.choice(USER_AGENT_LIST)
if ua:
request.headers.setdefault('User-Agent', ua)
#log.msg('>>>> UA %s'%request.headers)
I hope this article has been helpful to your Python programming.