The efficiency and stability of file_get_contents and curl were discussed

  • 2020-06-07 04:05:36
  • OfStack

I have done a lot of products to grab the content of other websites. I am used to using the convenient and fast file_get_contents function, but I always encounter the problem of obtaining failure. Although I set the timeout according to the example in the manual, it will not work most of the time:


$config['context'] = stream_context_create(array( ' http' => array( ' method' =>  " GET " ,
   'timeout' => 5// This timeout is erratic and often doesn't work 
   )
  ));

At this point, if you look at the connection pool of a server, you will find a heap of similar errors that will give you a headache:
file_get_contents(http://***): failed to open stream...
As a last resort, I installed the curl library and wrote a function replacement:

<span style="color:#000000; font-weight:bold">function</span> curl_file_get_contents<span style="color:#009900">(</span><span style="color:#000088">$durl</span><span style="color:#009900">)</span><span style="color:#009900">{</span>
&nbsp;&nbsp;&nbsp;<span style="color:#000088">$ch</span> <span style="color:#339933">=</span> <span style="color:#990000">curl_init</span><span style="color:#009900">(</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
&nbsp;&nbsp;&nbsp;<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_URL<span style="color:#339933">,</span> <span style="color:#000088">$durl</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
&nbsp;&nbsp;&nbsp;<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_TIMEOUT<span style="color:#339933">,</span> <span style="color:#cc66cc">5</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
&nbsp;&nbsp;&nbsp;<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_USERAGENT<span style="color:#339933">,</span> _USERAGENT_<span style="color:#009900">)</span><span style="color:#339933">;</span>
&nbsp;&nbsp;&nbsp;<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_REFERER<span style="color:#339933">,</span>_REFERER_<span style="color:#009900">)</span><span style="color:#339933">;</span>
&nbsp;&nbsp;&nbsp;<span style="color:#990000">curl_setopt</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#339933">,</span> CURLOPT_RETURNTRANSFER<span style="color:#339933">,</span> <span style="color:#cc66cc">1</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
&nbsp;&nbsp;&nbsp;<span style="color:#000088">$r</span> <span style="color:#339933">=</span> <span style="color:#990000">curl_exec</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
&nbsp;&nbsp;&nbsp;<span style="color:#990000">curl_close</span><span style="color:#009900">(</span><span style="color:#000088">$ch</span><span style="color:#009900">)</span><span style="color:#339933">;</span>
&nbsp;&nbsp;&nbsp;<span style="color:#b1b100">return</span> <span style="color:#000088">$r</span><span style="color:#339933">;</span>
&nbsp;<span style="color:#009900">}</span>

In this way, apart from the real network problem, there is no problem.
Here are some tests that others have done on curl and file_get_contents:
file_get_contents Grab ES30en. com takes seconds:
2.31319094
2.30374217
2.21512604
3.30553889
2.30124092
Time used by curl:
0.68719101
0.64675593
0.64326
0.81983113
0.63956594
Isn't that a big difference? Well, from my experience, the two tools are not only different in speed, but also in stability. It is suggested that the network data capture stability requirements are higher friends use the above curl_file_get_contents function, not only stable speed, but also fake browser to cheat the target address oh!


Related articles: