Detailed explanation of python docx necessary tools for processing Word

  • 2021-12-09 09:20:36
  • OfStack

My understanding

Why python-docx is used, Because a large number of online texts have been downloaded recently, But the format is html, I am personally accustomed to using word for text processing. So I tried to convert html documents into word. The first question to consider is how to store the text extracted from html in word. Before using pandoc direct conversion, The effect after belt conversion is too unsatisfactory, There is no format, It doesn't meet the needs of obsessive-compulsive people who have strict requirements for word format, So I searched everywhere for other methods. Finally, hard work pays off, After studying python-docx for a few days, I feel very suitable for me. I will analyze html documents and think about how to use python-docx to save the desired format word. Because of my word typesetting, I am used to setting the page with 5678 margins and A4 sizes. The main title of the text is square and small, and other titles are either bold or bold. The first line of the text should be indented by 2 characters to imitate Song _ GB2312, and the footer should be displayed with page numbers.

python-docx Creating a document is almost the same idea. A document, that is, the Document () object, should first be divided into different sections. That is to say, it is controlled by sections objects, and then each section is divided into different paragraphs paragraphs objects, and each section is composed of different blocks run objects. For different sections (section), one attribute of the page can be set, for different paragraphs (paragraph), spacing and indentation, line wrapping and paging can be set, and for different blocks (run), font, color and size can be set. You can set the general paragraphs and fonts of the whole article first, and then set them separately for different paragraphs and blocks.

I mainly talk about how to set up several contents I use.

Installation library:


pip install python-docx

Libraries used


from docx import Document (Document Read and Write) 
from docx.shared import Pt,Cm,Inches  (Font size, no 1 All of them will be used) 
from docx.oxml.ns import qn (Formatting fonts , Used in columns, etc.) 
from docx.shared import RGBColor  (Sets font color) 
from docx.enum.text import WD_ALIGN_PARAGRAPH  (Sets the way it is aligned) 
from docx.enum.section import WD_ORIENTATION  (Used for paper orientation) 

Set approximate format

After this setting is finished, there is a benefit that when writing documents into it, it will automatically press this format, and if there is any need to change it, it will be changed when writing separately.


docment = docx.Document(docx_tamplate) #  Read the template document, where you don't need the template document because python-docx I couldn't set the page number, so I built it first 1 Blank documents with page numbers as template documents 
#  Set the default format of the body 
#  Font size 3 Signature ( 16 ) 
docment.styles['Normal'].font.size = Pt(16)
#  Imitation of Song in font _GB2312
docment.styles['Normal'].font.name = u' Imitation of Song Dynasty _GB2312'
docment.styles['Normal']._element.rPr.rFonts.set(qn('w:eastAsia'), u' Imitation of Song Dynasty _GB2312')
#  Line spacing  28 Pounds   Do not leave blank before and after paragraph 
docment.styles['Normal'].paragraph_format.line_spacing = Pt(29)
docment.styles['Normal'].paragraph_format.space_before = Pt(0)
docment.styles['Normal'].paragraph_format.space_after = Pt(0)
#  Indent the first line 2 Character 
docment.styles['Normal'].paragraph_format.first_line_indent = 406400
#  Turn off orphan control 
docment.styles['Normal'].paragraph_format.widow_control = False
#  Set the page size 
docment.sections[0].page_height = Cm(29.7)  #  Settings A4 Height of paper 
docment.sections[0].page_width = Cm(21)  #  Settings A4 Width of paper 
#  Set Margins 
docment.sections[0].top_margin = Cm(3.7)
docment.sections[0].bottom_margin = Cm(3.4)
docment.sections[0].left_margin = Cm(2.8)
docment.sections[0].right_margin = Cm(2.6)

Format segments separately


doc=Document() # Create 1 Blank documents 
p1=doc.add_paragraph()  # Initialize the establishment 1 Natural segment 
p1.alignment=WD_ALIGN_PARAGRAPH.CENTER  # The alignment is center, without which the default is left alignment. Additional right alignment: RIGHT , both ends aligned: JUSTIFY , scatter alignment: DISTRIBUTE
 
p1.paragraph_format.line_spacing=1.5  # Set the paragraph with line spacing of 1.5 Times, or you can use it as the default value above Pt Unit to set 
p1.paragraph_format.first_line_indent=Inches(0.5)  # Paragraph indentation 0.5 Inches, I'm still used to setting 2 Character   Values are: 406400
p1.paragraph_format.left_line_indent=Inches(0.5)  # Set left indent 0.5 Inches. 1 I can't use it 
p1.paragraph_format.right_line_indent=Inches(0.5)  # Set right indentation 0.5 Inches, 1 I can't use it 
p1.paragraph_format.keep_together = False  #  Paging before paragraph 
p1.paragraph_format.keep_with_next = False  #  Same page as next paragraph 
p1.paragraph_format.page_break_before = True  #  There is no pagination in the paragraph 
p1.paragraph_format.widow_control = False  #  Solitary control 
p1.space_after=Pt(5)  # Set the distance behind the segment to 5 Pounds 
p1.space_before=Pt(5)  # Set the distance before the segment to 5 Pounds 
 
run1=p1.add_run(' How do you do ')   # Write the text "Hello" in the paragraph 
run1.font.size=Pt(12)  # Set the font size separately to 24
run1.font.bold=True  # Style setting bold 
run1.italic=True  # Font setting italic 
run1.font.underline = True  #  Underline 
run1.font.color.rgb = RGBColor(255, 0, 0)  #  Color 

Insert a picture


# Add pictures and set the size of pictures 
doc.add_picture(r" Picture path ", width=Cm(10))

Insert table


tab = doc.add_table(rows=5, cols=8, style='Table Grid') #  Create 1 A 5 Row 8 Columns in the style of Table Grid
tab.cell(0, 0).text = ' Gauge angle '  # 0 Row 0 The contents of the column are table corners 
cell=tab.cell(0, 1).merge(tab.cell(0, 3)) #  Merge 0 Row 1 Column to 0 Row 3 Column 
p = cell.paragraphs[0]
run = p.add_run( 'Merge ') # Create in the contents of the cell 1 And write 'Merge ' Text 
run.font.size = Pt(10.5)  #  Font size settings, and word The font size inside corresponds 5 Signature 
run.bold = True
p.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER  #  Set to bold   Center display 

Related articles: