Python handles PDF and generates multiple layers of PDF instance code

  • 2020-05-30 20:33:11
  • OfStack

Python provides a large number of PDF support libraries. This article tries out two libraries in the Python3 environment to complete the function of PDF generation. PyPDF supports reading PDF better, but has not found a way to generate multiple layers of PDF. Reportlab looks more mature and can easily generate multiple layers of PDF using Canvas, which can be used to scan images and search for content.


Generate a double PDF

 double PDF application PDF In the Canvas For the concept, draw the text first, and then draw the picture PDF . 

import os
# import urllib2
import time
from reportlab import platypus
from reportlab.lib.pagesizes import letter
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Image
from reportlab.pdfgen import canvas

image_file = "./42.png"

# Use Canvas to generate pdf
c = canvas.Canvas('reportlab_canvas.pdf', pagesize=letter)
width, height = letter

# say hello (note after rotate the y coord needs to be negative!)
c.drawString( 3*inch, 3*inch, "Hello World")
c.drawImage(image_file, 0 , 0)


Read PDF

from PyPDF2 import PdfFileWriter, PdfFileReader

output = PdfFileWriter()
input1 = PdfFileReader(open("jquery.pdf", "rb"))

# print document info

# print how many pages input1 has:
print ("pdf_document.pdf has %d pages." % input1.getNumPages())

# print page content
page_content = input1.getPage(0).extractText()
print( page_content )

# add page 1 from input1 to output document, unchanged

# add page 2 from input1, but rotated clockwise 90 degrees

# finally, write "output" to document-output.pdf
outputStream = open("PyPDF2-output.pdf", "wb")

But there are a lot of problems with PyPDF getting PDF content, and you can see the list of problems. There are also instructions in the documentation.

| extractText(self) | ## | # Locate all text drawing commands, in the order they are provided in the | # content stream, and extract the text. This works well for some PDF | # files, but poorly for others, depending on the generator used. This will | # be refined in the future. Do not rely on the order of text coming out of | # this function, as it will change if this function is made more | # sophisticated. | #
 | # Stability: Added in v1.7, will exist for all future v1.x releases. May | # be overhauled to provide more ordered text in the future. | # @return a unicode string object

Related articles: