Resolve the application of curl_multi in php

  • 2020-07-21 07:18:01
  • OfStack

I'm sure many of you are bothered by the poorly documented curl_multi1 family of functions in the php manual. The examples given are too simple to use.
•curl_multi_add_handle
•curl_multi_close
•curl_multi_exec
•curl_multi_getcontent
•curl_multi_info_read
•curl_multi_init
•curl_multi_remove_handle
•curl_multi_select
In general, when you think about using these functions, the obvious goal is to request multiple url at the same time, rather than one by one, or you might as well loop to curl_exec yourself.

The steps are summarized as follows:
Step 1: Call curl_multi_init
Step 2: Loop to curl_multi_add_handle
Note in this step that the second parameter to curl_multi_add_handle is a child of handle from curl_init.
Step 3: Keep calling curl_multi_exec
Step 4: Loop to curl_multi_getcontent as needed to get the results
Step 5: Call curl_multi_remove_handle and call curl_close for each word handle
Step 6: Call curl_multi_close

Here is a simple example found online by an author called dirty (I'll explain why dirty later) :


*
Here's a quick and dirty example for curl-multi from PHP, tested on PHP 5.0.0RC1 CLI / FreeBSD 5.2.1
*/
$connomains = array(
"http://www.cnn.com/",
"http://www.canada.com/",
"http://www.yahoo.com/"
);
$mh = curl_multi_init();
foreach ($connomains as $i => $url) {
     $conn[$i]=curl_init($url);
      curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,1);
      curl_multi_add_handle ($mh,$conn[$i]);
}
do { $n=curl_multi_exec($mh,$active); } while ($active);
foreach ($connomains as $i => $url) {
      $res[$i]=curl_multi_getcontent($conn[$i]);
      curl_close($conn[$i]);
}
print_r($res);

This is pretty much the case, but the Achilles' heel of this simple code is the part of the do loop, which is a dead loop throughout the url request, which can easily cause CPU to take up 100% of its usage.

Now let's improve it by using a poorly documented function curl_multi_select. Although C's curl library explains select, the interfaces and usage in php are different from C.

Change the do section above to the following:

do {
                        $mrc = curl_multi_exec($mh,$active);
                } while ($mrc == CURLM_CALL_MULTI_PERFORM);
                while ($active and $mrc == CURLM_OK) {
                        if (curl_multi_select($mh) != -1) {
                                do {
                                        $mrc = curl_multi_exec($mh, $active);
                                } while ($mrc == CURLM_CALL_MULTI_PERFORM);
                        }
                }

The return value of curl_multi_exec is used to determine whether there is still data. When there is data, it will call curl_multi_exec repeatedly. If there is no data at the moment, it will enter the select stage. The advantage here is that there is no unnecessary consumption of CPU.

In addition, there are some details that may sometimes be encountered:
Control the timeout of each request by using curl_setopt before curl_multi_add_handle:
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);

curl_error($conn[$i]) before curl_multi_getcontent;

Note: php's multi_curl feature is used with caution as Bug is used with some versions of curl and php. So the code you debug that works is probably not going to work on another machine.

For example, today I found that in php5.2.2 with curl/7.16.2 if the CURLOPT_USERAGENT attribute is set to some value, the actual sent HTTP header will become a string of 2 base values.

It turns out that the strip_tags function in this version of php doesn't handle data in base 2 very well, and that's why we found this Bug


Related articles: