Python regular expressions fix web site article font is not uniform solution

  • 2020-04-02 09:53:31
  • OfStack

There are defined fonts under the large frame of the website, including font size and color, etc. When the user posts the article, the text may be copied from other websites, and the font description information is also retained during the copying process. When an article is displayed on a page, the font defined in the article is used by default, and the global font defined in the larger frame is displayed only if the font does not exist in the article. So the content of the website will appear very messy, some articles font is very big, some articles font is very small, not beautiful. It would be nice if we could unify!

I am not familiar with HTML, CSS, etc., and I wonder if I can set up the font content defined in the article to be invalid.

Stupid people have stupid way, unified modification of the article, the user's definition of the font all deleted! Ha ha! If you do this by hand, this is a very heavy task, to preview the page first, if not uniform then change the font, fortunately, the editor has a "clear format" option, all the text, click OK, and then save... Also very troublesome

If you are simply modifying the font, the easiest way to do this is of course to modify the database directly, extract the articles from the database, remove the font-related tags, and then write them back to the database.

Specifically looking up the HTML reference manual, there are two ways to define a font:

1. Is to use < The font > Labels, such as:


<p>
<font size="2" face="Verdana">
This is a paragraph.
</font>
</p>
<p>
<font size="3" face="Times">
This is another paragraph.
</font>
</p>

This method is not recommended

2. Use the style definition, for example:


<p style="font-family:verdana;font-size:80%;color:green">
This is a paragraph with some text in it. This is a paragraph with some text in it. This is a paragraph with some text in it. This is a paragraph with some text in it.
</p>

Just delete the part of the font definition and replace it with python's regular expression module.


def format(data):
    ''' will font The labels and style Delete all tags '''
    p = re.compile(r'<font .*?>|</font>|style=".*?"')
    ret = p.sub('',data)
    if ret != data:
        return retelse:
        return None

Python database related operation should pay attention to update the data method, you can refer to this article: http://www.cnblogs.com/ma6174/archive/2013/02/21/2920126.html


Related articles: