php crawls pages in several ways

  • 2020-06-12 08:46:03
  • OfStack

Doing some weather forecast or 1 RSS subscription application, often need to grab a local file, 1 cases are used as php simulation browser to access, request access through http url address, and then get html source code or xml data, we can't get data directly to the output, often need to extract the content, and then to format, in a more friendly way.
The following is a brief description of several methods and principles for php to grab pages:
1. Main methods of PHP to crawl pages:
1. file () function
2. file_get_contents () function
3. fopen()- > fread()- > fclose () mode
4. curl way
5. fsockopen() function socket pattern
6. The use of plug-ins, such as: http: / / sourceforge net projects/snoopy /)

2. PHP Main ways to parse html or xml code:
1. file () function

<?php 
$url='http://t.qq.com'; 
$lines_array=file($url); 
$lines_string=implode('',$lines_array); 
echo htmlspecialchars($lines_string); 

2. file_get_contents () function
Using file_get_contents and fopen must have space on allow_url_fopen. php. ini, allow_url_fopen = On, allow_url_fopen Closed fopen and file_get_contents cannot open remote files.

<?php 
$url='http://t.qq.com'; 
$lines_string=file_get_contents($url); 
echo htmlspecialchars($lines_string); 

3. fopen()- > fread()- > fclose () mode

<?php 
$url='http://t.qq.com'; 
$handle=fopen($url,"rb"); 
$lines_string=""; 
do{ 
    $data=fread($handle,1024);
     if(strlen($data)==0) {
        break;
    } 
    $lines_string.=$data; 
}while(true); 
fclose($handle); 
echo htmlspecialchars($lines_string);

4. curl way
Using curl must have space on curl. php. ini, windows = ES86en_ES87en. dll, ssleay32, dll and ES91en32. dll to C:\WINDOWS\system32; The curl extension is installed under Linux.

<?php 
$url='http://t.qq.com'; 
$ch=curl_init(); 
$timeout=5; 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); 
$lines_string=curl_exec($ch); 
curl_close($ch); 
echo htmlspecialchars($lines_string);

5. fsockopen() function socket pattern
Whether socket mode can be executed correctly also depends on the setting of the server. Specifically, phpinfo can be used to check which communication protocols are turned on by the server. For example, my local php socket has not been turned on, so I can only use udp to test 1.

<?php                                                                                                                                                
$fp = fsockopen("udp://127.0.0.1", 13, $errno, $errstr) ; 
if (!$fp) { 
    echo "ERROR: $errno - $errstr<br />\n"
} else { 
    fwrite($fp, "\n")
    echo fread($fp, 26)
    fclose($fp)
}  

6. The plugin
There should be more plug-ins on the Internet. snoopy plug-ins are found on the Internet. Those who are interested can study 1.

Related articles: