Use php to determine whether a web page is gzip compressed

  • 2020-06-19 09:59:20
  • OfStack

Last night, when a friend in the group collected web pages, it was found that the web pages obtained by file_get_contents were saved locally as chaotic codes, and the Content-ES5en :gzip responded to the header
But it's normal to see it in a browser.
Because I have the experience of immediately finding out that gzip is opened and file_get_contents gets compressed pages, not uncompressed pages (I wonder if I should ask file_get_conttents to request the page with the corresponding parameters, directly get the page not compressed by gzip?)
Just recently I saw that you can determine the file type by reading the first two bytes of the file. Friends in the group also say that gzip compressed web pages (gbk code) have the first 2 bytes of 1F 8B so you can tell if gzip compressed web pages.
The code is as follows:

// Mir military net is used  gzip Compressed web pages  
//file_get_contents  Direct access to the web page is a mess.  
header('Content-Type:text/html;charset=utf-8' ); 
$url = 'http://www.miercn.com'; 
$file = fopen($url, "rb");   
// read-only 2 byte    If it is (16 Into the system )1f 8b (10 Into the system )31 139 Is opened gzip ; 
$bin = fread($file, 2);  
fclose($file);   
$strInfo = @unpack("C2chars", $bin);   
$typeCode = intval($strInfo['chars1'].$strInfo['chars2']);   
$isGzip = 0;   
switch ($typeCode)   
{ 
    case 31139:       
      // The website is up gzip 
        $isGzip = 1; 
        break; 
    default:   
        $isGzip = 0; 
}   
$url = $isGzip ? "compress.zlib://".$url:$url; // 3 expression  
$mierHtml = file_get_contents($url); // Access to mir military network data  
$mierHtml = iconv("gbk","utf-8",$mierHtml); 
echo $mierHtml;


Related articles: