Python masquerades as HTTP and 1.1 when collecting with scrapy

  • 2020-05-07 19:57:15
  • OfStack

This example illustrates how Python masquerades as HTTP/1.1 when collecting with scrapy. Share with you for your reference. The details are as follows:

Add the following code to the settings.py file

DOWNLOADER_HTTPCLIENTFACTORY = 'myproject.downloader.HTTPClientFactory'

Save the following code to a separate.py file
from scrapy.core.downloader.webclient import ScrapyHTTPClientFactory, ScrapyHTTPPageGetter
class PageGetter(ScrapyHTTPPageGetter):
    def sendCommand(self, command, path):
        self.transport.write('%s %s HTTP/1.1\r\n' % (command, path))
class HTTPClientFactory(ScrapyHTTPClientFactory):
     protocol = PageGetter

I hope this article has been helpful to your Python programming.


Related articles: