PHP curl concurrent best practice code sharing

  • 2020-05-19 04:23:18
  • OfStack

This article will explore two concrete implementation methods and compare the performance of the different methods.

1. The classic cURL concurrency mechanism and its existing problems

The classic cURL implementation mechanism is easy to find on the web, such as the following implementation in the PHP online manual:

 
function classic_curl($urls, $delay) { 
$queue = curl_multi_init(); 
$map = array(); 

foreach ($urls as $url) { 
// create cURL resources 
$ch = curl_init(); 

// set URL and other appropriate options 
curl_setopt($ch, CURLOPT_URL, $url); 

curl_setopt($ch, CURLOPT_TIMEOUT, 1); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch, CURLOPT_NOSIGNAL, true); 

// add handle 
curl_multi_add_handle($queue, $ch); 
$map[$url] = $ch; 
} 

$active = null; 

// execute the handles 
do { 
$mrc = curl_multi_exec($queue, $active); 
} while ($mrc == CURLM_CALL_MULTI_PERFORM); 

while ($active > 0 && $mrc == CURLM_OK) { 
if (curl_multi_select($queue, 0.5) != -1) { 
do { 
$mrc = curl_multi_exec($queue, $active); 
} while ($mrc == CURLM_CALL_MULTI_PERFORM); 
} 
} 

$responses = array(); 
foreach ($map as $url=>$ch) { 
$responses[$url] = callback(curl_multi_getcontent($ch), $delay); 
curl_multi_remove_handle($queue, $ch); 
curl_close($ch); 
} 

curl_multi_close($queue); 
return $responses; 
} 

First press all URL into concurrent queue, then execute the concurrent process, waiting for all requests received after the subsequent processing such as data analysis. In the actual process, because of the influence of network transmission, part of the content will take precedence over other URL URL returned, but the classic cURL concurrent must wait for the slowest URL return to deal with, wait for means CPU idle and waste. If URL queue is short, the idle and waste is still in acceptable range, But if the queue is long, the wait and waste becomes unacceptable.

2. Improved Rolling cURL concurrency mode

Careful analysis we found that classic cURL concurrency is optimized space, optimize the way when when a URL request as soon as possible to deal with it, after processing while waiting for the other URL returns, rather than wait for the slowest interface after return to start processing etc, so as to avoid CPU idle and waste. Don't gossip, said the following with specific implementation:
 
function rolling_curl($urls, $delay) { 
$queue = curl_multi_init(); 
$map = array(); 

foreach ($urls as $url) { 
$ch = curl_init(); 

curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_TIMEOUT, 1); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch, CURLOPT_NOSIGNAL, true); 

curl_multi_add_handle($queue, $ch); 
$map[(string) $ch] = $url; 
} 

$responses = array(); 
do { 
while (($code = curl_multi_exec($queue, $active)) == CURLM_CALL_MULTI_PERFORM) ; 

if ($code != CURLM_OK) { break; } 

// a request was just completed -- find out which one 
while ($done = curl_multi_info_read($queue)) { 

// get the info and content returned on the request 
$info = curl_getinfo($done['handle']); 
$error = curl_error($done['handle']); 
$results = callback(curl_multi_getcontent($done['handle']), $delay); 
$responses[$map[(string) $done['handle']]] = compact('info', 'error', 'results'); 

// remove the curl handle that just completed 
curl_multi_remove_handle($queue, $done['handle']); 
curl_close($done['handle']); 
} 

// Block for data in / output; error handling is done by curl_multi_exec 
if ($active > 0) { 
curl_multi_select($queue, 0.5); 
} 

} while ($active); 

curl_multi_close($queue); 
return $responses; 
} 


3. Performance comparison between the two concurrent implementations

The performance comparison test before and after the improvement was conducted on the LINUX host. The concurrent queue used for the test was as follows:

http://item.taobao.com/item.htm?id=14392877692
http://item.taobao.com/item.htm?id=16231676302
http://item.taobao.com/item.htm?id=17037160462
http://item.taobao.com/item.htm?id=5522416710
http://item.taobao.com/item.htm?id=16551116403
http://item.taobao.com/item.htm?id=14088310973

Briefly describe the principles of experimental design and the format of performance test results: In order to ensure the reliable results, each group of experiments repeatedly 20 times, in a single experiment, given the same interface URL collection, measured Classic (concurrent mechanism refers to the classic) and Rolling (refers to the improved concurrency mechanisms) two concurrent mechanism of time (seconds), short time-consuming one wins (Winner), and calculate the time saved (Excellence, seconds), and performance improvement ratio (Excel. %). In order to try to close to the real request and keep the experiment of simple, On the handling of returned results just do a simple regular expression match, rather than for other complex operation. In addition, in order to determine the results of processing the callback on performance contrast test results, the influence of can use usleep simulated reality is responsible for data processing logic (such as extraction, participles, written to the file or database).

The callback function used in the performance test is:
 
function callback($data, $delay) { 
preg_match_all('/<h3>(.+)<\/h3>/iU', $data, $matches); 
usleep($delay); 
return compact('data', 'matches'); 
} 

Data processing callback without delay: Rolling Curl is slightly better, but the performance improvement is not obvious.
Data processing callback delay of 5 ms: Rolling Curl wins, performance improvement of about 40%.
By comparison with the performance of the above, in the application of concurrent processing URL queue scene Rolling cURL should be more choice, is a large amount of concurrent (1000 +), can control the maximum length of the concurrent queue, 20, for example, whenever a URL return and processed immediately after joining a URL hasn't been a request to the queue, so write the code will be more robust, not concurrency is too large, jammed or crash. Detailed implementation please reference: http: / / code. google. com/p/rolling curl /

Related articles: