PHP CURL gets the method by which cookies simulates login

  • 2020-10-31 21:42:02
  • OfStack

To extract part of the data of google search, we found that google is very bad for software to grab its data. Before, we could fake ES2en-ES3en to grab data, but now it is not good. Using packet capture data, Google determines cookies, and when you don't have cookies, you can directly return 302 jumps, and it is several 10 302 jumps in a row, so you can't capture data at all.
Therefore, when sending the search command, cookies needs to be extracted and saved first, and then the saved cookies needs to be used to send the search command again so that the data can be captured normally. This in fact and the forum simulation login 1 reason, first POST login, get cookies and save, and then use the cookies access can be.
The PHP code is as follows:

<?php
header('Content-Type: text/html; charset=utf-8');

$cookie_file = dirname(__FILE__).'/cookie.txt';
//$cookie_file = tempnam("tmp","cookie");

// First get cookies And save the 
$url = "http://www.google.com.hk";
$ch = curl_init($url); // Initialize the 
curl_setopt($ch, CURLOPT_HEADER, 0); // Do not return header Part of the 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Returns a string instead of direct output 
curl_setopt($ch, CURLOPT_COOKIEJAR,  $cookie_file); // storage cookies
curl_exec($ch);
curl_close($ch);

// Use the ones saved above cookies To visit again 
$url = "http://www.google.com.hk/search?oe=utf8&ie=utf8&source=uds&hl=zh-CN&q=qq";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file); // Use the ones obtained above cookies
$response = curl_exec($ch);
curl_close($ch);

echo $response;
?>

Related articles: