Detailed explanation of php using curl_init of and curl_multi_init of multithreading speed comparison

  • 2021-10-27 06:47:12
  • OfStack

This article illustrates the speed comparison of php using curl_init () and curl_multi_init () multithreading. Share it for your reference, as follows:

curl_init () plays a very important role in php, especially when crawling web page content or file information. For example, the previous article "php uses curl to obtain header detection and turns on GZip compression" introduced the power of curl_init ().

curl_init () deals with transactions in a single-threaded mode. If you need to go through multi-threaded mode for transaction processing, php provides us with a function curl_multi_init (), which is the function of multi-threaded mode to deal with transactions.

curl_init() And curl_multi_init() Speed comparison of

curl_multi_init() Can multithreading improve the processing speed of web pages? Today, I will verify this problem through experiments.

Today, my test is very simple, that is, to crawl the content of www. webkaka. com web page, to crawl 5 times in a row, using curl_init() And curl_multi_init() Function to complete, record the time consumption of the two, and draw a conclusion by comparison.

First, use curl_init() A single thread grabs the contents of www. webkaka. com pages five times in a row.

The program code is as follows:


<?php
$mtime = explode(" ", microtime());
$mtime = $mtime[1].($mtime[0] * 1000);
$mtime2 = explode(".", $mtime);
$mtime = $mtime2[0];
echo $mtime;
echo "<br>";
for($i=1; $i<=5; $i++){
$szUrl = 'http://www.webkaka.com/';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $szUrl);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_ENCODING, '');
$data=curl_exec($curl);
echo $data;
echo "<br>";
$mtime_ = explode(" ", microtime());
$mtime_ = $mtime_[1].($mtime_[0] * 1000);
$mtime2_ = explode(".", $mtime_);
$mtime_ = $mtime2_[0];
echo $mtime_;
echo "<br>";
echo $mtime_ - $mtime;
}
?>

Then, use curl_multi_init() Multithreading grabs the contents of www. webkaka. com web page five times in a row.

The code is as follows:


<?php
echo date("Y-m-d H:m:s",time());
echo " ";
echo floor(microtime()*1000);
echo "<br>";
$mtime = explode(" ", microtime());
$mtime = $mtime[1].($mtime[0] * 1000);
$mtime2 = explode(".", $mtime);
$mtime = $mtime2[0];
echo $mtime;
echo "<br>";
$urls = array(
'http://www.webkaka.com',
'http://www.webkaka.com',
'http://www.webkaka.com',
'http://www.webkaka.com',
'http://www.webkaka.com');
print_r(async_get_url($urls)); // [0] => example1, [1] => example2
echo "<br>";
echo date("Y-m-d H:m:s",time());
echo " ";
echo floor(microtime()*1000);
echo "<br>";
$mtime_ = explode(" ", microtime());
$mtime_ = $mtime_[1].($mtime_[0] * 1000);
$mtime2_ = explode(".", $mtime_);
$mtime_ = $mtime2_[0];
echo $mtime_;
echo "<br>";
echo $mtime_ - $mtime;
function async_get_url($url_array, $wait_usec = 0)
{
  if (!is_array($url_array))
    return false;
  $wait_usec = intval($wait_usec);
  $data  = array();
  $handle = array();
  $running = 0;
  $mh = curl_multi_init(); // multi curl handler
  $i = 0;
  foreach($url_array as $url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return don't print
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)');
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // 302 redirect
    curl_setopt($ch, CURLOPT_MAXREDIRS, 7);
    curl_multi_add_handle($mh, $ch); //  Put  curl resource  Put in  multi curl handler  Li 
    $handle[$i++] = $ch;
  }
  /*  Execute  */
  do {
    curl_multi_exec($mh, $running);
    if ($wait_usec > 0) /*  Each  connect  How long is the interval  */
      usleep($wait_usec); // 250000 = 0.25 sec
  } while ($running > 0);
  /*  Read data  */
  foreach($handle as $i => $ch) {
    $content = curl_multi_getcontent($ch);
    $data[$i] = (curl_errno($ch) == 0) ? $content : false;
  }
  /*  Remove  handle*/
  foreach($handle as $ch) {
    curl_multi_remove_handle($mh, $ch);
  }
  curl_multi_close($mh);
  return $data;
}
?>

In order to avoid randomness, I tested it five times (using CTRL+F5 forced refresh), and the data are as follows:

curl_init ():

第1次 第2次 第3次 第4次 第5次 平均
耗时(ms) 3724 3615 2540 1957 2794 2926

curl_multi_init ():

第1次 第2次 第3次 第4次 第5次 平均
耗时(ms) 4275 2912 3691 4198 3891 3793

From the test results, we found that the time difference between the two methods is not much, only over 700 milliseconds. Many people thought that multithreading would take much less time than single threading, but this is not the case. From the data point of view, multithreading takes 1 point more time than single threading. However, for some transactions, multi-threaded processing is necessarily in pursuit of speed, which should be noted.

About curl_multi_init()

1 Generally speaking, the thought of using curl_multi_init() The purpose is to request multiple url at the same time, instead of requesting one by one in turn, otherwise, curl_init() It's over.

However, when using curl_multi, you may encounter such phenomena as excessive consumption of cpu and fake animation of web pages. You can see "PHP uses curl_multi_select to solve the problem of fake animation of curl_multi web pages"

The steps to use curl_multi are summarized as follows:

Step 1: Call curl_multi_init()0 Step 2: Loop call curl_multi_add_handle What should be noted in this step is that, curl_multi_add_handle The second parameter of is the child handle from curl_init. Step 3: Continue to call curl_multi_exec Step 4: Loop the call as needed curl_multi_getcontent Get results Step 5: Call curl_multi_remove_handle And call curl_close for each word handle Step 6: Call curl_multi_close

Explanation of each function:

curl_multi_init()
Initializes 1 curl batch handle resource.

curl_multi_add_handle()
Add a separate curl handle resource to the curl batch session. curl_multi_add_handle() The function has two arguments, the first representing an curl batch handle resource and the second representing a separate curl handle resource.

curl_multi_exec()
Parses an curl batch handle, curl_multi_exec() The function has two arguments, the first representing a batch handle resource, and the second is a reference value that represents the number of single curl handle resources remaining to be processed.

curl_multi_remove_handle()
Removes a handle resource from the curl batch handle resource, curl_multi_remove_handle() The function takes two arguments, the first representing an curl batch handle resource, and the second representing a separate curl handle resource.

curl_multi_close()
Close 1 batch handle resource.

curl_multi_getcontent()
After setting up curl_multi_init()0 Returns the text stream of the obtained output.

curl_multi_info_read()
Gets the relevant transmission information of the currently parsed curl.

Instances

Please see in this article curl_multi_init()1 The writing of.

For more readers interested in PHP related contents, please check the topics of this site: "Summary of php curl Usage", "Summary of PHP Network Programming Skills", "Encyclopedia of PHP Array (Array) Operation Skills", "Summary of php String (string) Usage", "Tutorial on PHP Data Structure and Algorithm" and "Summary of json Format Data Operation Skills in PHP"

I hope this article is helpful to everyone's PHP programming.


Related articles: