Detailed explanation of python docx necessary tools for processing Word
- 2021-12-09 09:20:36
- OfStack
My understanding
Why python-docx is used, Because a large number of online texts have been downloaded recently, But the format is html, I am personally accustomed to using word for text processing. So I tried to convert html documents into word. The first question to consider is how to store the text extracted from html in word. Before using pandoc direct conversion, The effect after belt conversion is too unsatisfactory, There is no format, It doesn't meet the needs of obsessive-compulsive people who have strict requirements for word format, So I searched everywhere for other methods. Finally, hard work pays off, After studying python-docx for a few days, I feel very suitable for me. I will analyze html documents and think about how to use python-docx to save the desired format word. Because of my word typesetting, I am used to setting the page with 5678 margins and A4 sizes. The main title of the text is square and small, and other titles are either bold or bold. The first line of the text should be indented by 2 characters to imitate Song _ GB2312, and the footer should be displayed with page numbers.
python-docx Creating a document is almost the same idea. A document, that is, the Document () object, should first be divided into different sections. That is to say, it is controlled by sections objects, and then each section is divided into different paragraphs paragraphs objects, and each section is composed of different blocks run objects. For different sections (section), one attribute of the page can be set, for different paragraphs (paragraph), spacing and indentation, line wrapping and paging can be set, and for different blocks (run), font, color and size can be set. You can set the general paragraphs and fonts of the whole article first, and then set them separately for different paragraphs and blocks.
I mainly talk about how to set up several contents I use.
Installation library:
pip install python-docx
Libraries used
from docx import Document (Document Read and Write)
from docx.shared import Pt,Cm,Inches (Font size, no 1 All of them will be used)
from docx.oxml.ns import qn (Formatting fonts , Used in columns, etc.)
from docx.shared import RGBColor (Sets font color)
from docx.enum.text import WD_ALIGN_PARAGRAPH (Sets the way it is aligned)
from docx.enum.section import WD_ORIENTATION (Used for paper orientation)
Set approximate format
After this setting is finished, there is a benefit that when writing documents into it, it will automatically press this format, and if there is any need to change it, it will be changed when writing separately.
docment = docx.Document(docx_tamplate) # Read the template document, where you don't need the template document because python-docx I couldn't set the page number, so I built it first 1 Blank documents with page numbers as template documents
# Set the default format of the body
# Font size 3 Signature ( 16 )
docment.styles['Normal'].font.size = Pt(16)
# Imitation of Song in font _GB2312
docment.styles['Normal'].font.name = u' Imitation of Song Dynasty _GB2312'
docment.styles['Normal']._element.rPr.rFonts.set(qn('w:eastAsia'), u' Imitation of Song Dynasty _GB2312')
# Line spacing 28 Pounds Do not leave blank before and after paragraph
docment.styles['Normal'].paragraph_format.line_spacing = Pt(29)
docment.styles['Normal'].paragraph_format.space_before = Pt(0)
docment.styles['Normal'].paragraph_format.space_after = Pt(0)
# Indent the first line 2 Character
docment.styles['Normal'].paragraph_format.first_line_indent = 406400
# Turn off orphan control
docment.styles['Normal'].paragraph_format.widow_control = False
# Set the page size
docment.sections[0].page_height = Cm(29.7) # Settings A4 Height of paper
docment.sections[0].page_width = Cm(21) # Settings A4 Width of paper
# Set Margins
docment.sections[0].top_margin = Cm(3.7)
docment.sections[0].bottom_margin = Cm(3.4)
docment.sections[0].left_margin = Cm(2.8)
docment.sections[0].right_margin = Cm(2.6)
Format segments separately
doc=Document() # Create 1 Blank documents
p1=doc.add_paragraph() # Initialize the establishment 1 Natural segment
p1.alignment=WD_ALIGN_PARAGRAPH.CENTER # The alignment is center, without which the default is left alignment. Additional right alignment: RIGHT , both ends aligned: JUSTIFY , scatter alignment: DISTRIBUTE
p1.paragraph_format.line_spacing=1.5 # Set the paragraph with line spacing of 1.5 Times, or you can use it as the default value above Pt Unit to set
p1.paragraph_format.first_line_indent=Inches(0.5) # Paragraph indentation 0.5 Inches, I'm still used to setting 2 Character Values are: 406400
p1.paragraph_format.left_line_indent=Inches(0.5) # Set left indent 0.5 Inches. 1 I can't use it
p1.paragraph_format.right_line_indent=Inches(0.5) # Set right indentation 0.5 Inches, 1 I can't use it
p1.paragraph_format.keep_together = False # Paging before paragraph
p1.paragraph_format.keep_with_next = False # Same page as next paragraph
p1.paragraph_format.page_break_before = True # There is no pagination in the paragraph
p1.paragraph_format.widow_control = False # Solitary control
p1.space_after=Pt(5) # Set the distance behind the segment to 5 Pounds
p1.space_before=Pt(5) # Set the distance before the segment to 5 Pounds
run1=p1.add_run(' How do you do ') # Write the text "Hello" in the paragraph
run1.font.size=Pt(12) # Set the font size separately to 24
run1.font.bold=True # Style setting bold
run1.italic=True # Font setting italic
run1.font.underline = True # Underline
run1.font.color.rgb = RGBColor(255, 0, 0) # Color
Insert a picture
# Add pictures and set the size of pictures
doc.add_picture(r" Picture path ", width=Cm(10))
Insert table
tab = doc.add_table(rows=5, cols=8, style='Table Grid') # Create 1 A 5 Row 8 Columns in the style of Table Grid
tab.cell(0, 0).text = ' Gauge angle ' # 0 Row 0 The contents of the column are table corners
cell=tab.cell(0, 1).merge(tab.cell(0, 3)) # Merge 0 Row 1 Column to 0 Row 3 Column
p = cell.paragraphs[0]
run = p.add_run( 'Merge ') # Create in the contents of the cell 1 And write 'Merge ' Text
run.font.size = Pt(10.5) # Font size settings, and word The font size inside corresponds 5 Signature
run.bold = True
p.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Set to bold Center display