asp.net regular expression deletes the code for the specified HTML tag
- 2020-05-10 18:01:57
- OfStack
If you completely delete the HTML tags inside, it may cause difficulties in reading (such as a, img tags). It is better to delete part 1 and keep part 1.
In regular expressions, it is easy to understand how to determine whether a string is contained, but how to determine not to include a string (a string, not a character, some, not some) is a mystery.
This regular is to judge that the HTML tag does not contain li/ul/a/img/br/span/b. As for the above requirement, it is to delete the HTML tag listed here, which I have been trying to figure out for a long time.
(? ! exp) matches the position not followed by exp
/? \ s? I'm going to try to write it first < Later, but the test failed.
Here is a simple function that strings together the TAG that you want to keep, generates a regular expression, and then removes the TAG that you don't need...
Fixed:
In the above example, if li is retained, the actual operation will find that link is also retained, while a is retained, addr is also retained. The solution is to add the \b assertion.
In regular expressions, it is easy to understand how to determine whether a string is contained, but how to determine not to include a string (a string, not a character, some, not some) is a mystery.
<(?!((/?\s?li)|(/?\s?ul)|(/?\s?a)|(/?\s?img)|(/?\s?br)|(/?\s?span)|(/?\s?b)))[^>]+>
This regular is to judge that the HTML tag does not contain li/ul/a/img/br/span/b. As for the above requirement, it is to delete the HTML tag listed here, which I have been trying to figure out for a long time.
(? ! exp) matches the position not followed by exp
/? \ s? I'm going to try to write it first < Later, but the test failed.
Here is a simple function that strings together the TAG that you want to keep, generates a regular expression, and then removes the TAG that you don't need...
private static string RemoveSpecifyHtml(string ctx) {
string[] holdTags = { "a", "img", "br", "strong", "b", "span" };// To keep the tag
// <(?!((/?\s?li)|(/?\s?ul)|(/?\s?a)|(/?\s?img)|(/?\s?br)|(/?\s?span)|(/?\s?b)))[^>]+>
string regStr = string.Format(@"<(?!((/?\s?{0})))[^>]+>", string.Join(@")|(/?\s?", holdTags));
Regex reg = new Regex(regStr, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase);
return reg.Replace(ctx, "");
}
Fixed:
In the above example, if li is retained, the actual operation will find that link is also retained, while a is retained, addr is also retained. The solution is to add the \b assertion.
<(?!((/?\s?li\b)|(/?\s?ul)|(/?\s?a\b)|(/?\s?img\b)|(/?\s?br\b)|(/?\s?span\b)|(/?\s?b\b)))[^>]+>
private static string RemoveSpecifyHtml(string ctx) {
string[] holdTags = { "a", "img", "br", "strong", "b", "span", "li" };// Keep the tag
// <(?!((/?\s?li\b)|(/?\s?ul\b)|(/?\s?a\b)|(/?\s?img\b)|(/?\s?br\b)|(/?\s?span\b)|(/?\s?b\b)))[^>]+>
string regStr = string.Format(@"<(?!((/?\s?{0})))[^>]+>", string.Join(@"\b)|(/?\s?", holdTags));
Regex reg = new Regex(regStr, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase);
return reg.Replace(ctx, "");
}