PHP fixes HTML tag implementation code of that does not close properly to support nesting and nearby closure
- 2020-05-17 04:58:54
- OfStack
fixHtmlTag
version 0.2
This version addresses the issues left over from the last one, namely the proximity closure and nested closure issues. You can look at the comments to the code.
version 0.2
This version addresses the issues left over from the last one, namely the proximity closure and nested closure issues. You can look at the comments to the code.
<?php
/**
* fixHtmlTag
*
* HTML Tag repair function, which can repair incorrectly closed HTML The label
*
* Due to too much uncertainty, the two modes "nested closed mode" and "closed mode" are temporarily provided
* "Nearby closed mode" should be enough.
*
* These two patterns are two terms I created to explain the implementation of this function,
* Just know what it means.
* 1 , nested closed mode, NEST , is the default closing mode. namely "<body><div> hello "
* such html The code will be changed to "<body><div> hello </div></body>"
* 2 , close mode nearby, CLOSE The pattern will look like this "<p> hello <p> Why not
* Closed? " Is changed to "<p> hello </p><p> Why isn't it closed </p>"
*
* In nested closed mode (by default, no special arguments are required), you can pass in those that need to be closed nearby
* The tag name will be similar this way "<body><p> hello </p><p> I'm all right " convert
* "<body><p> hello </p><p> I'm all right </p></body>" In the form.
* The index needs to be written in the following way when passing parameters, and the Settings that do not need to be modified can be omitted
*
* $param = array(
* 'html' => '', // mandatory
* 'options' => array(
* 'tagArray' => array();
* 'type' => 'NEST',
* 'length' => null,
* 'lowerTag' => TRUE,
* 'XHtmlFix' => TRUE,
* )
* );
* fixHtmlTag($param);
*
* The values corresponding to the above indexes have the following meanings
* string $html In need of modification html code
* array $tagArray When in nested mode, you need to close an array of tags nearby
* string $type Schema name currently supported NEST and CLOSE Two modes, if set to CLOSE , the parameter will be ignored $tagArray Is set while all nearby close all labels
* ini $length If you want to truncate 1 Fixed length, can be assigned here, this length refers to the length of the string
* bool $lowerTag Whether to convert all tags in the code to lowercase by default TRUE
* bool $XHtmlFix Whether the handling is inconsistent XHTML Specification of the label, coming soon <br> convert <br />
*
* @author IT "Daruma" <itbudaoweng@gmail.com>
* @version 0.2
* @link http://yungbo.com IT "Daruma"
* @link http://enenba.com/?post=19 so-and-so
* @param array $param Array parameters that need to be assigned a specific index
* @return string $result After processing html code
* @since 2012-04-14
*/
function fixHtmlTag($param = array()) {
// The default value of the parameter
$html = '';
$tagArray = array();
$type = 'NEST';
$length = null;
$lowerTag = TRUE;
$XHtmlFix = TRUE;
// First of all get 1 Dimensional array, which is $html and $options (if parameters are provided)
extract($param);
// If there is options , extract the relevant variables
if (isset($options)) {
extract($options);
}
$result = ''; // It's going to come back eventually html code
$tagStack = array(); // Label stack with array_push() and array_pop() Analog implementation
$contents = array(); // Used to store html The label
$len = 0; // The initial length of the string
// Set the closing mark $isClosed By default, TRUE, If close proximity is required, the value after successful matching of the start tag is false, After successful closure, is true
$isClosed = true;
// All tags to be processed are lowercase
$tagArray = array_map('strtolower', $tagArray);
// "Legal" single closing tag
$singleTagArray = array(
'<meta',
'<link',
'<base',
'<br',
'<hr',
'<input',
'<img'
);
// Check match pattern $type By default, NEST model
$type = strtoupper($type);
if (!in_array($type, array('NEST', 'CLOSE'))) {
$type = 'NEST';
}
// In order to 1 right < and > Is the separator, and will be the original html The tag and the string inside the tag are put into an array
$contents = preg_split("/(<[^>]+?>)/si", $html, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
foreach ($contents as $tag) {
if ('' == trim($tag)) {
$result .= $tag;
continue;
}
// Matches standard single closed labels, such as <br />
if (preg_match("/<(\w+)[^\/>]*?\/>/si", $tag)) {
$result .= $tag;
continue;
}
// Match the start tag, or if it is a single tag, push it out
else if (preg_match("/<(\w+)[^\/>]*?>/si", $tag, $match)) {
// If the 1 The tag is not closed and on 1 Four tags belong to the near closure type
// Then close it, top 1 TAB out of stack
// If the label is not closed
if (false === $isClosed) {
// Close mode, close all tags directly
if ('CLOSE' == $type) {
$result .= '</' . end($tagStack) . '>';
array_pop($tagStack);
}
// The default nested mode, close to the closing parameter provided by the label
else {
if (in_array(end($tagStack), $tagArray)) {
$result .= '</' . end($tagStack) . '>';
array_pop($tagStack);
}
}
}
// If the parameter $lowerTag for TRUE Changes the tag name to lowercase
$matchLower = $lowerTag == TRUE ? strtolower($match[1]) : $match[1];
$tag = str_replace('<' . $match[1], '<' . $matchLower, $tag);
// Start a new tag combination
$result .= $tag;
array_push($tagStack, $matchLower);
// If a single label belongs to a convention, close it and push it off the stack
foreach ($singleTagArray as $singleTag) {
if (stripos($tag, $singleTag) !== false) {
if ($XHtmlFix == TRUE) {
$tag = str_replace('>', ' />', $tag);
}
array_pop($tagStack);
}
}
// Close mode, the state becomes not closed
if ('CLOSE' == $type) {
$isClosed = false;
}
// Default nesting mode if the label is located in the provided $tagArray Inside, the state is changed to not closed
else {
if (in_array($matchLower, $tagArray)) {
$isClosed = false;
}
}
unset($matchLower);
}
// Matches the closed label and, if appropriate, exits the stack
else if (preg_match("/<\/(\w+)[^\/>]*?>/si", $tag, $match)) {
// If the parameter $lowerTag for TRUE Changes the tag name to lowercase
$matchLower = $lowerTag == TRUE ? strtolower($match[1]) : $match[1];
if (end($tagStack) == $matchLower) {
$isClosed = true; // The match is completed and the label is closed
$tag = str_replace('</' . $match[1], '</' . $matchLower, $tag);
$result .= $tag;
array_pop($tagStack);
}
unset($matchLower);
}
// Match comments and connect directly $result
else if (preg_match("/<!--.*?-->/si", $tag)) {
$result .= $tag;
}
// Put the string in $result And I'm going to do the truncation
else {
if (is_null($length) || $len + mb_strlen($tag) < $length) {
$result .= $tag;
$len += mb_strlen($tag);
} else {
$str = mb_substr($tag, 0, $length - $len + 1);
$result .= $str;
break;
}
}
}
// If there is also a stack of open label to connect to $result
while (!empty($tagStack)) {
$result .= '</' . array_pop($tagStack) . '>';
}
return $result;
}