PHP fixes HTML tag implementation code of that does not close properly to support nesting and nearby closure

  • 2020-05-17 04:58:54
  • OfStack

fixHtmlTag
version 0.2
This version addresses the issues left over from the last one, namely the proximity closure and nested closure issues. You can look at the comments to the code.
 
<?php 

/** 
* fixHtmlTag 
* 
* HTML Tag repair function, which can repair incorrectly closed  HTML  The label  
* 
*  Due to too much uncertainty, the two modes "nested closed mode" and "closed mode" are temporarily provided  
*  "Nearby closed mode" should be enough.  
* 
*  These two patterns are two terms I created to explain the implementation of this function,  
*  Just know what it means.  
* 1 , nested closed mode, NEST , is the default closing mode. namely  "<body><div> hello " 
*  such  html  The code will be changed to  "<body><div> hello </div></body>" 
* 2 , close mode nearby, CLOSE The pattern will look like this  "<p> hello <p> Why not  
*  Closed? "  Is changed to  "<p> hello </p><p> Why isn't it closed </p>" 
* 
*  In nested closed mode (by default, no special arguments are required), you can pass in those that need to be closed nearby  
*  The tag name will be similar this way  "<body><p> hello </p><p> I'm all right "  convert  
* "<body><p> hello </p><p> I'm all right </p></body>" In the form.  
*  The index needs to be written in the following way when passing parameters, and the Settings that do not need to be modified can be omitted  
* 
* $param = array( 
* 'html' => '', // mandatory  
* 'options' => array( 
* 'tagArray' => array(); 
* 'type' => 'NEST', 
* 'length' => null, 
* 'lowerTag' => TRUE, 
* 'XHtmlFix' => TRUE, 
* ) 
* ); 
* fixHtmlTag($param); 
* 
*  The values corresponding to the above indexes have the following meanings  
* string $html  In need of modification  html  code  
* array $tagArray  When in nested mode, you need to close an array of tags nearby  
* string $type  Schema name currently supported  NEST  and  CLOSE  Two modes, if set to  CLOSE , the parameter will be ignored  $tagArray  Is set while all nearby close all labels  
* ini $length  If you want to truncate 1 Fixed length, can be assigned here, this length refers to the length of the string  
* bool $lowerTag  Whether to convert all tags in the code to lowercase by default  TRUE 
* bool $XHtmlFix  Whether the handling is inconsistent  XHTML  Specification of the label, coming soon  <br>  convert  <br /> 
* 
* @author IT "Daruma"  <itbudaoweng@gmail.com> 
* @version 0.2 
* @link http://yungbo.com IT "Daruma"  
* @link http://enenba.com/?post=19  so-and-so  
* @param array $param  Array parameters that need to be assigned a specific index  
* @return string $result  After processing  html  code  
* @since 2012-04-14 
*/ 
function fixHtmlTag($param = array()) { 
// The default value of the parameter  
$html = ''; 
$tagArray = array(); 
$type = 'NEST'; 
$length = null; 
$lowerTag = TRUE; 
$XHtmlFix = TRUE; 

// First of all get 1 Dimensional array, which is  $html  and  $options  (if parameters are provided)  
extract($param); 

// If there is  options , extract the relevant variables  
if (isset($options)) { 
extract($options); 
} 

$result = ''; // It's going to come back eventually  html  code  
$tagStack = array(); // Label stack with  array_push()  and  array_pop()  Analog implementation  
$contents = array(); // Used to store  html  The label  
$len = 0; // The initial length of the string  

// Set the closing mark  $isClosed By default,  TRUE,  If close proximity is required, the value after successful matching of the start tag is  false, After successful closure, is  true 
$isClosed = true; 

// All tags to be processed are lowercase  
$tagArray = array_map('strtolower', $tagArray); 

// "Legal" single closing tag  
$singleTagArray = array( 
'<meta', 
'<link', 
'<base', 
'<br', 
'<hr', 
'<input', 
'<img' 
); 

// Check match pattern  $type By default,  NEST  model  
$type = strtoupper($type); 
if (!in_array($type, array('NEST', 'CLOSE'))) { 
$type = 'NEST'; 
} 

// In order to 1 right  <  and  >  Is the separator, and will be the original  html  The tag and the string inside the tag are put into an array  
$contents = preg_split("/(<[^>]+?>)/si", $html, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE); 

foreach ($contents as $tag) { 
if ('' == trim($tag)) { 
$result .= $tag; 
continue; 
} 

// Matches standard single closed labels, such as <br /> 
if (preg_match("/<(\w+)[^\/>]*?\/>/si", $tag)) { 
$result .= $tag; 
continue; 
} 

// Match the start tag, or if it is a single tag, push it out  
else if (preg_match("/<(\w+)[^\/>]*?>/si", $tag, $match)) { 
// If the 1 The tag is not closed and on 1 Four tags belong to the near closure type  
// Then close it, top 1 TAB out of stack  

// If the label is not closed  
if (false === $isClosed) { 
// Close mode, close all tags directly  
if ('CLOSE' == $type) { 
$result .= '</' . end($tagStack) . '>'; 
array_pop($tagStack); 
} 
// The default nested mode, close to the closing parameter provided by the label  
else { 
if (in_array(end($tagStack), $tagArray)) { 
$result .= '</' . end($tagStack) . '>'; 
array_pop($tagStack); 
} 
} 
} 

// If the parameter  $lowerTag  for  TRUE  Changes the tag name to lowercase  
$matchLower = $lowerTag == TRUE ? strtolower($match[1]) : $match[1]; 

$tag = str_replace('<' . $match[1], '<' . $matchLower, $tag); 
// Start a new tag combination  
$result .= $tag; 
array_push($tagStack, $matchLower); 

// If a single label belongs to a convention, close it and push it off the stack  
foreach ($singleTagArray as $singleTag) { 
if (stripos($tag, $singleTag) !== false) { 
if ($XHtmlFix == TRUE) { 
$tag = str_replace('>', ' />', $tag); 
} 
array_pop($tagStack); 
} 
} 

// Close mode, the state becomes not closed  
if ('CLOSE' == $type) { 
$isClosed = false; 
} 
// Default nesting mode if the label is located in the provided  $tagArray  Inside, the state is changed to not closed  
else { 
if (in_array($matchLower, $tagArray)) { 
$isClosed = false; 
} 
} 
unset($matchLower); 
} 

// Matches the closed label and, if appropriate, exits the stack  
else if (preg_match("/<\/(\w+)[^\/>]*?>/si", $tag, $match)) { 

// If the parameter  $lowerTag  for  TRUE  Changes the tag name to lowercase  
$matchLower = $lowerTag == TRUE ? strtolower($match[1]) : $match[1]; 

if (end($tagStack) == $matchLower) { 
$isClosed = true; // The match is completed and the label is closed  
$tag = str_replace('</' . $match[1], '</' . $matchLower, $tag); 
$result .= $tag; 
array_pop($tagStack); 
} 
unset($matchLower); 
} 

// Match comments and connect directly  $result 
else if (preg_match("/<!--.*?-->/si", $tag)) { 
$result .= $tag; 
} 

// Put the string in  $result  And I'm going to do the truncation  
else { 
if (is_null($length) || $len + mb_strlen($tag) < $length) { 
$result .= $tag; 
$len += mb_strlen($tag); 
} else { 
$str = mb_substr($tag, 0, $length - $len + 1); 
$result .= $str; 
break; 
} 
} 
} 

// If there is also a stack of open label to connect to  $result 
while (!empty($tagStack)) { 
$result .= '</' . array_pop($tagStack) . '>'; 
} 
return $result; 
} 

Related articles: