An example of Django using Cookie to implement anti crawler

  • 2021-11-02 01:20:17
  • OfStack

As we know, Diango receives HTTP request information with Cookie information. The function of Cookie is to identify the current user, and the function of Cookie is illustrated by the following example. Example:

The browser sends a request to the server (Diango), After the server responds, 2 will be disconnected (the session ends), The next time the user comes to request the server, The server has no way to identify who this user is, For example, if the user login function is not supported by Cookie mechanism, it can only be realized by querying the database, and the user can be identified only by re-operating the user login once every time the page is refreshed, which will bring a lot of redundant work to developers, and the simple user login function will bring huge load pressure to the server.

Cookie passes data from the browser to the server, Enables the server to recognize the current user, The server's other mechanism for Cookie is implemented through Session, Session stores basic information about the current user, Such as name, age and gender, Because Cookie is stored in the browser, and the data of Cookie is provided by the server, if the server stores the user information directly in the browser, it is easy to disclose the user information, and the size of Cookie cannot exceed 4KB, which cannot support Chinese. Therefore, a mechanism is needed to store the user data in a certain domain of the server, which is Session.

In a word, Cookie and Session appeared to solve the disadvantage of stateless HTTP protocol, and to make browsers and servers establish long-term contact sessions.

Cookie not only solves the disadvantage of stateless HTTP protocol, but also uses Cookie to implement anti-crawler mechanism. With the development of big data and artificial intelligence, crawler technology is becoming more and more perfect. In order to protect the security and load capacity of its own data, websites will set up anti-crawler mechanism in websites.

Since Cookie is passed from the browser to the server through the HTTP protocol, the Cookie object can be fetched from the request object request of the view function, and Diango provides the following methods to manipulate the Cookie object:


#  Get  Cookie  And  Python  Dictionary reading method based on 1 To 
request . COOKIES['uuid']
request . COOKIES . get('uuid')

#  Add in the response content  Cookie ,   Will  Cookie  Return to browser 
return HttpResponse('Hello world')
response . set_cookie('key', 'value')
return response

#  Delete in response content Cookie
return HttpResponse('Hello world')
response . delete_cookie('key')
return response

Operating on an Cookie object is nothing more than getting, adding, and deleting an Cookie. Adding Cookie information is done using the set_cookie method, which is defined by the response class HttpResponseBase

key: Let key of ECookie be similar to key of dictionary. value: Let value of Cookie be similar to value of dictionary. max age: Sets the effective time of Cookie, in seconds. expires: Sets the effective time of Cookie, in date format. path: Set the effective path of Cookie, and the default value is root directory (home page of website) domain: Sets the domain name for which Cookie takes effect. secure: Set the transmission mode. If it is False, use HTTP, otherwise use HTTPS. httponly: Sets whether only HTTP protocol can be used for transmission. samesite: Set the mandatory mode, with the optional value of lax or strict, mainly to prevent CSRF attacks.

The common anti-crawler is to set the parameters max_age, expires and path. The parameter max_age or expires is used to set the validity of Cookie, so that the crawler cannot crawl the website data for a long time; Parameter path is used to hide the generation process of Cookie, which is not easy for crawler developers to find and crack.

I hope readers have a certain understanding of Cookie to realize anti-crawler.


Related articles: