Methods and Techniques of curl off site acquisition

  • 2020-12-19 20:57:19
  • OfStack

Reasons for choosing curl

About curl and file_get_contents, quote 1 easy to understand comparison:
file_get_contents is actually a bunch of built-in file operations function merge version, such as file_exists fopen, fread, fclose, specially provided for lazy people, and it is mainly used to deal with local files, but because of lazy, added support for network file at the same time;
curl is a library specifically designed for network interactions and provides a heap customization option that is naturally more stable than file_get_contents for different environments.

Method of use

1. Enable curl support

Since the php environment is not supported by curl by default after installation, the php.ini file needs to be modified and found. extension= ES34en_curl.dll. Remove the colon and restart the service.

2. curl was used for data fetching


//  Initialize the 1 a  cURL  object  
$curl = curl_init(); 
//  Set what you need to grab URL 
curl_setopt($curl, CURLOPT_URL, 'http://www.cmx8.cn'); 
//  Set up the header 
curl_setopt($curl, CURLOPT_HEADER, 1); 
//  Set up the cURL  Argument that requires the result to be saved in a string or output to the screen.  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
//  run cURL , request page  
$data = curl_exec($curl); 
//  Shut down URL request  
curl_close($curl);

3. Find key data through regular matching


//$data is curl_exec The value returned is the target content of the collection  
preg_match_all("/<li class=\"item\">(.*?)<\/li>/",$data, $out, PREG_SET_ORDER); 
foreach($out as $key => $value){ 
    // here $value Is an array that records both the entire sentence with the matching character and the individual matching character found  
    echo ' The complete sentence that matches: '.$value[0].'
'; 
    echo ' Separately matched: '.$value[1].'
'; 
}

skills

1. Setting of timeout

curl_setopt($ch, opt) can be used to set 1 timeout Settings, mainly including:

CURLOPT_TIMEOUT sets the maximum number of seconds that cURL is allowed to execute.
CURLOPT_TIMEOUT_MS sets the maximum number of milliseconds that cURL is allowed to execute. (Added in cURL 7.16.2. Available from PHP 5.2.3. )
CURLOPT_CONNECTTIMEOUT The time to wait before initiating a connection, if set to 0, the wait is unlimited.
CURLOPT_CONNECTTIMEOUT_MS The time it takes to try to connect, in milliseconds. If set to 0, wait indefinitely. It was added in cURL 7.16.2. Available from PHP 5.2.3.
CURLOPT_DNS_CACHE_TIMEOUT sets the time for storing DNS information in memory, which defaults to 120 seconds.


curl_setopt($ch, CURLOPT_TIMEOUT, 60);   // Just set it up 1 The number of seconds will do  
curl_setopt($ch, CURLOPT_NOSIGNAL, 1);    // Notice the millisecond timeout 1 I'm going to set this  
curl_setopt($ch, CURLOPT_TIMEOUT_MS, 200);  // Timeout in milliseconds, cURL 7.16.2 "Is added. from PHP 5.2.3 Can use  

2. Submit data through post and keep cookie


// The following extract 1 Let me give you an example to learn from : 
//Curl  To simulate the login  discuz  The program , Suitable for DZ7.0 

!extension_loaded('curl') && die('The curl extension is not loaded.');    

$discuz_url = 'http://www.lxvoip.com';// BBS address     
$login_url = $discuz_url .'/logging.php?action=login';// Login page address     
$get_url = $discuz_url .'/my.php?item=threads'; // My post     

$post_fields = array();    
// The following two items need not be modified     
$post_fields['loginfield'] = 'username';    
$post_fields['loginsubmit'] = 'true';    
// Username and password must be filled in     
$post_fields['username'] = 'lxvoip';    
$post_fields['password'] = '88888888';    
// Security question     
$post_fields['questionid'] = 0;    
$post_fields['answer'] = '';    
//@todo Verification code     
$post_fields['seccodeverify'] = '';    

// Get the form FORMHASH    
$ch = curl_init($login_url);    
curl_setopt($ch, CURLOPT_HEADER, 0);    
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);    
$contents = curl_exec($ch);    
curl_close($ch);    
preg_match('/<input\s*type="hidden"\s*name="formhash"\s*value="(.*?)"\s*\/>/i', $contents, $matches);    
if(!empty($matches)) {    
    $formhash = $matches[1];    
} else {    
    die('Not found the forumhash.');    
}    

//POST Data, acquisition COOKIE    
$cookie_file = dirname(__FILE__) . '/cookie.txt';    
//$cookie_file = tempnam('/tmp');    
$ch = curl_init($login_url);    
curl_setopt($ch, CURLOPT_HEADER, 0);    
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);    
curl_setopt($ch, CURLOPT_POST, 1);    
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);    
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);    
curl_exec($ch);    
curl_close($ch);    

// Take what I got up here COOKIE Get the content of the page that you need to log in to view     
$ch = curl_init($get_url);    
curl_setopt($ch, CURLOPT_HEADER, 0);    
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);    
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);    
$contents = curl_exec($ch);    
curl_close($ch);    

var_dump($contents);


Related articles: