The PHP article collects the URL completion function of FormatUrl

  • 2020-05-19 04:20:17
  • OfStack

Write the required function for collection, URL completion function, also known as FormatUrl.
The purpose of writing this function is to develop a collection program. When collecting articles, you will often encounter the "relative path" or "absolute root path" in the page. If it is not "absolute full path", you cannot collect URL.

Therefore, this function is required to format the code, and all the hyperlinks are formatted once, so that the correct URL can be directly collected.

Path knowledge popularization
Relative path: ".. / "./ "or nothing in front of it
Absolute root path: /path/ xxx.html
The full path: absolutely http: / / www. xxx. com/path/xxx. html
Examples of use:
 
<?php 
$surl="//www.ofstack.com/"; 
$gethtm = '<a href="/index.htm"> Home page </a><a href="Resolvent/index.htm"> The solution </a>'; 
echo formaturl($gethtm,$surl); 
?> 

Output: < a href="//www.ofstack.com/index.htm" > Home page < /a > < a href="//www.ofstack.com/Resolvent/index.htm" > The solution < /a >
-- demonstration example --
The original path code: http: / / www newnew. cn/newnewindex aspx
Output demo code: http: / / www maifp. com/aaa/test php
Here is the code for the function
 
<?php 
function formaturl($l1,$l2){ 
if (preg_match_all("/(<img[^>]+src=\"([^\"]+)\"[^>]*>)|(<a[^>]+href=\"([^\"]+)\"[^>]*>)|(<img[^>]+src='([^']+)'[^>]*>)|(<a[^>]+href='([^']+)'[^>]*>)/i",$l1,$regs)){ 
foreach($regs[0] as $num => $url){ 
$l1 = str_replace($url,lIIIIl($url,$l2),$l1); 
} 
} 
return $l1; 
} 
function lIIIIl($l1,$l2){ 
if(preg_match("/(.*)(href|src)\=(.+?)( |\/\>|\>).*/i",$l1,$regs)){$I2 = $regs[3];} 
if(strlen($I2)>0){ 
$I1 = str_replace(chr(34),"",$I2); 
$I1 = str_replace(chr(39),"",$I1); 
}else{return $l1;} 
$url_parsed = parse_url($l2); 
$scheme = $url_parsed["scheme"];if($scheme!=""){$scheme = $scheme."://";} 
$host = $url_parsed["host"]; 
$l3 = $scheme.$host; 
if(strlen($l3)==0){return $l1;} 
$path = dirname($url_parsed["path"]);if($path[0]=="\\"){$path="";} 
$pos = strpos($I1,"#"); 
if($pos>0) $I1 = substr($I1,0,$pos); 
// Determine type  
if(preg_match("/^(http|https|ftp):(\/\/|\\\\)(([\w\/\\\+\-~`@:%])+\.)+([\w\/\\\.\=\?\+\-~`@\':!%#]|(&)|&)+/i",$I1)){return $l1; }//http At the beginning of url Type to skip  
elseif($I1[0]=="/"){$I1 = $l3.$I1;}// An absolute path  
elseif(substr($I1,0,3)=="../"){// Relative paths  
while(substr($I1,0,3)=="../"){ 
$I1 = substr($I1,strlen($I1)-(strlen($I1)-3),strlen($I1)-3); 
if(strlen($path)>0){ 
$path = dirname($path); 
} 
} 
$I1 = $l3.$path."/".$I1; 
} 
elseif(substr($I1,0,2)=="./"){ 
$I1 = $l3.$path.substr($I1,strlen($I1)-(strlen($I1)-1),strlen($I1)-1); 
} 
elseif(strtolower(substr($I1,0,7))=="mailto:"||strtolower(substr($I1,0,11))=="javascript:"){ 
return $l1; 
}else{ 
$I1 = $l3.$path."/".$I1; 
} 
return str_replace($I2,"\"$I1\"",$l1); 
} 
?> 

The following link is where you can learn PHP regular expressions. Leave a link here in case it gets lost...

Related articles: