asp.net regular expression deletes the code for the specified HTML tag

  • 2020-05-10 18:01:57
  • OfStack

If you completely delete the HTML tags inside, it may cause difficulties in reading (such as a, img tags). It is better to delete part 1 and keep part 1.

In regular expressions, it is easy to understand how to determine whether a string is contained, but how to determine not to include a string (a string, not a character, some, not some) is a mystery.
 
<(?!((/?\s?li)|(/?\s?ul)|(/?\s?a)|(/?\s?img)|(/?\s?br)|(/?\s?span)|(/?\s?b)))[^>]+> 

This regular is to judge that the HTML tag does not contain li/ul/a/img/br/span/b. As for the above requirement, it is to delete the HTML tag listed here, which I have been trying to figure out for a long time.
(? ! exp) matches the position not followed by exp
/? \ s? I'm going to try to write it first < Later, but the test failed.

Here is a simple function that strings together the TAG that you want to keep, generates a regular expression, and then removes the TAG that you don't need...
 
private static string RemoveSpecifyHtml(string ctx) { 
string[] holdTags = { "a", "img", "br", "strong", "b", "span" };// To keep the  tag 
// <(?!((/?\s?li)|(/?\s?ul)|(/?\s?a)|(/?\s?img)|(/?\s?br)|(/?\s?span)|(/?\s?b)))[^>]+> 
string regStr = string.Format(@"<(?!((/?\s?{0})))[^>]+>", string.Join(@")|(/?\s?", holdTags)); 
Regex reg = new Regex(regStr, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase); 


return reg.Replace(ctx, ""); 
} 

Fixed:
In the above example, if li is retained, the actual operation will find that link is also retained, while a is retained, addr is also retained. The solution is to add the \b assertion.
 
<(?!((/?\s?li\b)|(/?\s?ul)|(/?\s?a\b)|(/?\s?img\b)|(/?\s?br\b)|(/?\s?span\b)|(/?\s?b\b)))[^>]+> 

private static string RemoveSpecifyHtml(string ctx) { 
string[] holdTags = { "a", "img", "br", "strong", "b", "span", "li" };// Keep the  tag 
// <(?!((/?\s?li\b)|(/?\s?ul\b)|(/?\s?a\b)|(/?\s?img\b)|(/?\s?br\b)|(/?\s?span\b)|(/?\s?b\b)))[^>]+> 
string regStr = string.Format(@"<(?!((/?\s?{0})))[^>]+>", string.Join(@"\b)|(/?\s?", holdTags)); 
Regex reg = new Regex(regStr, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase); 

return reg.Replace(ctx, ""); 
} 

Related articles: