Java method to delete comment content in HTML in bulk
- 2020-04-01 03:10:40
- OfStack
In fact, there are many ways to delete the comments in the HTML text, here on their own write a method, when the notes, students can refer to the need.
Comments in HTML text have several features:
1. In pairs, where there is a beginning, there must be an end.
2. Comment tag is not nested, comment start tag (hereinafter called < ! --) the next must be the corresponding closing tag (hereinafter referred to as -->) .
3. There may be multiple pairs of comment tags in a line.
Comments can also wrap.
There are the following situations:
<html>
<!--This is a head-->
<head>A Head</head>
<!--This is
a div -->
<div>A Div</div>
<!--This is
a span--><!--span in
a div--><div>a div</div>
<div><span>A span</span><div>
<!--This is a
span--><div>A div</div><!--span in a div-->
<div><span>A span</span><div>
<html>
Ideas:
1. Read one line of text at a time.
2. If the line contains only < ! - and - > And < ! - in - > Before. Directly delete the comment content between the two tags to get the rest of the content.
3. If the line contains only < ! - and - > , but < ! - in - > After. Gets the content between the two tags and notes that it has encountered < ! - label.
4. If the line contains only < ! --, get the contents in front of the label, and note that it has encountered < ! - label.
5. If the line contains only --> , get the contents after the label, and note that it has encountered --> The label.
Step 2,3,4, and 5 for the rest of the line.
Save the rest.
Read the next line.
public class HtmlCommentHandler {
private static class HtmlCommentDetector {
private static final String COMMENT_START = "<!--";
private static final String COMMENT_END = "-->";
//Is the string an HTML comment line that contains the opening and closing tags of the comment "<! -- -->"
private static boolean isCommentLine(String line) {
return containsCommentStartTag(line) && containsCommentEndTag(line)
&& line.indexOf(COMMENT_START) < line.indexOf(COMMENT_END);
}
//Whether to include the start tag for the comment
private static boolean containsCommentStartTag(String line) {
return StringUtils.isNotEmpty(line) &&
line.indexOf(COMMENT_START) != -1;
}
//Contains the closing tag for the comment
private static boolean containsCommentEndTag(String line) {
return StringUtils.isNotEmpty(line) &&
line.indexOf(COMMENT_END) != -1;
}
private static String deleteCommentInLine(String line) {
while (isCommentLine(line)) {
int start = line.indexOf(COMMENT_START) + COMMENT_START.length();
int end = line.indexOf(COMMENT_END);
line = line.substring(start, end);
}
return line;
}
//Gets the content before the start annotation
private static String getBeforeCommentContent(String line) {
if (!containsCommentStartTag(line))
return line;
return line.substring(0, line.indexOf(COMMENT_START));
}
//Gets what follows the comment line
private static String getAfterCommentContent(String line) {
if (!containsCommentEndTag(line))
return line;
return line.substring(line.indexOf(COMMENT_END) + COMMENT_END.length());
}
}
public static String readHtmlContentWithoutComment(BufferedReader reader) throws IOException {
StringBuilder builder = new StringBuilder();
String line = null;
//Whether the current line is in the comment
boolean inComment = false;
while (ObjectUtils.isNotNull(line = reader.readLine())) {
//If you include comment tags
while (HtmlCommentDetector.containsCommentStartTag(line) ||
HtmlCommentDetector.containsCommentEndTag(line)) {
//Removes the content between comment tags that appear in pairs
// <!-- comment -->
if (HtmlCommentDetector.isCommentLine(line)) {
line = HtmlCommentDetector.deleteCommentInLine(line);
}
//If it is not a comment line, but there is still a start tag and an end tag, the end tag must precede the start tag
// xxx -->content<!--
else if (HtmlCommentDetector.containsCommentStartTag(line) && HtmlCommentDetector.containsCommentEndTag(line)) {
//Gets the text before the start tag after the end tag, and sets the inComment to true
line = HtmlCommentDetector.getAfterCommentContent(line);
line = HtmlCommentDetector.getBeforeCommentContent(line);
inComment = true;
}
//If only the start tag exists, because comment tags do not support nesting, lines with only the start tag must not inComment
// content <!--
else if (!inComment && HtmlCommentDetector.containsCommentStartTag(line)) {
//Set the inComment to true. Gets the contents before the start tag
inComment = true;
line = HtmlCommentDetector.getBeforeCommentContent(line);
}
//If only the end tag exists, because comment tags do not support nesting, only the line of the end tag must be inComment
// -->content
else if (inComment && HtmlCommentDetector.containsCommentEndTag(line)) {
//Set the inComment to false. Gets the content after the closing tag
inComment = false;
line = HtmlCommentDetector.getAfterCommentContent(line);
}
//Save the non-commented contents of the line
if (StringUtils.isNotEmpty(line))
builder.append(line);
}
//Save the line that does not have any comment labels on it and that inComment = false
if (StringUtils.isNotEmpty(line) && !inComment)
builder.append(line);
}
return builder.toString();
}
}
Of course, there are many other ways to do this, either by regular matching deletions or by starting and ending with a Stack tag.
And so on, the above code has been tested and used, hopefully useful for those of you who need it.