Java regular expressions remove the html tag
- 2020-06-03 06:38:39
- OfStack
Java in regular expressions to remove html label, the main purpose of more accurate display content, such as 1 time before doing similar to function of blog post, when typing in the editor will style tags also into the backstage database and save, but when shown the for example displays the text as the top 50 words, then need to remove all html tags, and then the interception of 50 words, so I passed the Java regular expression implements the following methods, the code is as follows:
Note: This is the Java regular expression remove html tag method.
private static final String regEx_script = "<script[^>]*?>[\\s\\S]*?<\\/script>"; // define script The regular expression of
private static final String regEx_style = "<style[^>]*?>[\\s\\S]*?<\\/style>"; // define style The regular expression of
private static final String regEx_html = "<[^>]+>"; // define HTML A regular expression for a tag
private static final String regEx_space = "\\s*|\t|\r|\n";// Defines a space return newline character
private static final String regEx_w = "<w[^>]*?>[\\s\\S]*?<\\/w[^>]*?>";// Define all w The label
/**
* @param htmlStr
* @return delete Html The label
* @author LongJin
*/
public static String delHTMLTag(String htmlStr) {
Pattern p_w = Pattern.compile(regEx_w, Pattern.CASE_INSENSITIVE);
Matcher m_w = p_w.matcher(htmlStr);
htmlStr = m_w.replaceAll(""); // filter script The label
Pattern p_script = Pattern.compile(regEx_script, Pattern.CASE_INSENSITIVE);
Matcher m_script = p_script.matcher(htmlStr);
htmlStr = m_script.replaceAll(""); // filter script The label
Pattern p_style = Pattern.compile(regEx_style, Pattern.CASE_INSENSITIVE);
Matcher m_style = p_style.matcher(htmlStr);
htmlStr = m_style.replaceAll(""); // filter style The label
Pattern p_html = Pattern.compile(regEx_html, Pattern.CASE_INSENSITIVE);
Matcher m_html = p_html.matcher(htmlStr);
htmlStr = m_html.replaceAll(""); // filter html The label
Pattern p_space = Pattern.compile(regEx_space, Pattern.CASE_INSENSITIVE);
Matcher m_space = p_space.matcher(htmlStr);
htmlStr = m_space.replaceAll(""); // Filter space return labels
htmlStr = htmlStr.replaceAll(" ", ""); // filter
return htmlStr.trim(); // Return text string
}
ps: The method is for reference only, for everyone to learn from each other, if there is any deficiency or questions welcome comments.