Detailed simulation of concurrency using curl multithreading

2020-06-15 07:50:20
OfStack

First, take a look at the curl multithreaded function in php:


# curl_multi_add_handle
# curl_multi_close
# curl_multi_exec
# curl_multi_getcontent
# curl_multi_info_read
# curl_multi_init
# curl_multi_remove_handle
# curl_multi_select

In general, when you think of using these functions, the obvious goal is to request multiple url at the same time, rather than one by one, or you might as well loop to curl_exec yourself.
The steps are summarized as follows:
Step 1: Call curl_multi_init
Step 2: Loop to curl_multi_add_handle
Note in this step that the second parameter to curl_multi_add_handle is a child of curl_init.
Step 3: Keep calling curl_multi_exec
Step 4: Loop to curl_multi_getcontent as needed to get the results
Step 5: Call curl_multi_remove_handle and call curl_close for each word handle
Step 6: Call curl_multi_close
Here is a simple example found online by an author called dirty (I'll explain why dirty later) :


/*
Here's a quick and dirty example for curl-multi from PHP, tested on PHP 5.0.0RC1 CLI / FreeBSD 5.2.1
*/
$connomains = array(
"http://www.cnn.com/",
"http://www.canada.com/",
"http://www.yahoo.com/"
);
$mh = curl_multi_init();
foreach ($connomains as $i => $url) {
     $conn[$i]=curl_init($url);
      curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,1);
      curl_multi_add_handle ($mh,$conn[$i]);
}
do { $n=curl_multi_exec($mh,$active); } while ($active);
foreach ($connomains as $i => $url) {
      $res[$i]=curl_multi_getcontent($conn[$i]);
      curl_close($conn[$i]);
}
print_r($res);

This is pretty much the case, but the Achilles' heel of this simple code is the part of the do loop, which is an endless loop throughout the url request and can easily cause CPU to take up 100% of its usage.

Now let's improve it by using a function called curl_multi_select, which has almost no documentation. Although C's curl library explains select, the interfaces and usage in php are different from C's.
Change the do section above to the following:


do {
                        $mrc = curl_multi_exec($mh,$active);
                } while ($mrc == CURLM_CALL_MULTI_PERFORM);
                while ($active and $mrc == CURLM_OK) {
                        if (curl_multi_select($mh) != -1) {
                                do {
                                        $mrc = curl_multi_exec($mh, $active);
                                } while ($mrc == CURLM_CALL_MULTI_PERFORM);
                        }
                }

Since $active will not become false until all url data has been accepted, the return value of curl_multi_exec is used to determine whether there is still data. When there is data, it will call curl_multi_exec repeatedly. If there is no data at present, it will enter select stage. The good thing here is that you don't have to consume CPU.

In addition, there are some details that may sometimes be encountered:
Control the timeout for each request by using curl_add_ES88en before curl_multi_add_handle:
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
curl_error($conn[$i]) before curl_multi_getcontent;

Here I simply use the above example of dirty (enough used, I don't see 100% use of cpu).
One interface of the "hotspot" (ES106en.ES107en) simulates concurrency by reading and writing data to memcache. Due to confidentiality, relevant data and results are not posted.

Simulated three times, the first time 10 threads request 1000 times, the second time 100 threads request 1000 times, the third time 1000 threads request 100 times (already quite laborious, dare not set more than 1000 threads).
It seems that curl multithreaded simulation concurrency has a certain limitation.

It is also suspected that there may be large errors in the results due to multithreaded delays, compared to the data found. There is little difference between the initialization time and set time, and the difference lies in get method, so we can simply exclude this point ~~~