Java regular expressions remove the html tag

  • 2020-06-03 06:38:39
  • OfStack

Java in regular expressions to remove html label, the main purpose of more accurate display content, such as 1 time before doing similar to function of blog post, when typing in the editor will style tags also into the backstage database and save, but when shown the for example displays the text as the top 50 words, then need to remove all html tags, and then the interception of 50 words, so I passed the Java regular expression implements the following methods, the code is as follows:

Note: This is the Java regular expression remove html tag method.


private static final String regEx_script = "<script[^>]*?>[\\s\\S]*?<\\/script>"; //  define script The regular expression of 
  private static final String regEx_style = "<style[^>]*?>[\\s\\S]*?<\\/style>"; //  define style The regular expression of 
  private static final String regEx_html = "<[^>]+>"; //  define HTML A regular expression for a tag 
  private static final String regEx_space = "\\s*|\t|\r|\n";//  Defines a space return newline character 
  private static final String regEx_w = "<w[^>]*?>[\\s\\S]*?<\\/w[^>]*?>";// Define all w The label 
/**
   * @param htmlStr
   * @return  delete Html The label 
   * @author LongJin
   */
  public static String delHTMLTag(String htmlStr) {
    Pattern p_w = Pattern.compile(regEx_w, Pattern.CASE_INSENSITIVE);
    Matcher m_w = p_w.matcher(htmlStr);
    htmlStr = m_w.replaceAll(""); //  filter script The label 
    Pattern p_script = Pattern.compile(regEx_script, Pattern.CASE_INSENSITIVE);
    Matcher m_script = p_script.matcher(htmlStr);
    htmlStr = m_script.replaceAll(""); //  filter script The label 
    Pattern p_style = Pattern.compile(regEx_style, Pattern.CASE_INSENSITIVE);
    Matcher m_style = p_style.matcher(htmlStr);
    htmlStr = m_style.replaceAll(""); //  filter style The label 
    Pattern p_html = Pattern.compile(regEx_html, Pattern.CASE_INSENSITIVE);
    Matcher m_html = p_html.matcher(htmlStr);
    htmlStr = m_html.replaceAll(""); //  filter html The label 
    Pattern p_space = Pattern.compile(regEx_space, Pattern.CASE_INSENSITIVE);
    Matcher m_space = p_space.matcher(htmlStr);
    htmlStr = m_space.replaceAll(""); //  Filter space return labels 
    htmlStr = htmlStr.replaceAll(" ", ""); // filter  
    return htmlStr.trim(); //  Return text string 
  }

ps: The method is for reference only, for everyone to learn from each other, if there is any deficiency or questions welcome comments.


Related articles: