Detailed method of crawling Post request data based on Python

  • 2021-06-28 13:12:21
  • OfStack

Why do I do this?

Chatting with a classmate, he wants to crawl an post request from a website

observation

There are two types of post request parameters for this website: (1) The parameter body is placed in query, that is, the url splicing parameter (2) An empty json object is added to body. As to why an empty json object is added, guess the reason is the anti-crawler.The body parameter, which has both an query parameter and an empty object, is a very large thing in the brain.
1 First I did some experiments on the apizza website to find the above rule, and found that the request parameters of the website are in the form of raw. It is not easy to find the rule by writing code directly.

Source code


import requests
import json
headers = {
    'Accept':'application/json, text/javascript, */*; q=0.01',
    'X-Requested-With':'XMLHttpRequest',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
    'Content-Type':'application/json',
    'Accept-Encoding':'gzip, deflate',
    'Accept-Language':'zh-CN,zh;q=0.8',
    'Cache-Control':'no-cache',
  }
# Empty object, body parameter 
data = {}
data = json.dumps(data)
page = 0
url = ' Web site address followed by parameters ?param1=1¶m1='+str(page)
response = requests.post(url = url,data=data ,headers =headers )
print(response.url)
print(response.text)

summary

Phenomena found in related tools How to request: post or get or something else Parameter type: form-data or raw or other Parameter location: if post request, in query or body, or both

ps: python requests initiates an http POST request

python requests initiates an http POST request with parameters and a request header:


#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
import json
url = 'http://official-account/app/messages/group'
body = {"type": "text", "content": " Test Text ", "tag_id": "20717"}
headers = {'content-type': "application/json", 'Authorization': 'APP appid = 4abf1a,token = 9480295ab2e2eddb8'}
#print type(body)
#print type(json.dumps(body))
#  Here's a detail if body Need json Form, need to be dealt with 
#  Can be data = json.dumps(body)
response = requests.post(url, data = json.dumps(body), headers = headers)
#  Or you can simply data Field to field json Fields, 2.4.3 Support after version 
# response = requests.post(url, json = body, headers = headers)
#  Return information 
print response.text
#  Return to Response Header 
print response.status_code 

summary


Related articles: