Python handles PDF and generates multiple layers of PDF instance code
- 2020-05-30 20:33:11
Python provides a large number of PDF support libraries. This article tries out two libraries in the Python3 environment to complete the function of PDF generation. PyPDF supports reading PDF better, but has not found a way to generate multiple layers of PDF. Reportlab looks more mature and can easily generate multiple layers of PDF using Canvas, which can be used to scan images and search for content.
Generate a double PDF
double PDF application PDF In the Canvas For the concept, draw the text first, and then draw the picture PDF . import os # import urllib2 import time from reportlab import platypus from reportlab.lib.pagesizes import letter from reportlab.lib.units import inch from reportlab.platypus import SimpleDocTemplate, Image from reportlab.pdfgen import canvas image_file = "./42.png" # Use Canvas to generate pdf c = canvas.Canvas('reportlab_canvas.pdf', pagesize=letter) width, height = letter c.setFillColorRGB(0,0.77,0.77) # say hello (note after rotate the y coord needs to be negative!) c.drawString( 3*inch, 3*inch, "Hello World") c.drawImage(image_file, 0 , 0) c.showPage() c.save()
from PyPDF2 import PdfFileWriter, PdfFileReader output = PdfFileWriter() input1 = PdfFileReader(open("jquery.pdf", "rb")) # print document info print(input1.getDocumentInfo()) # print how many pages input1 has: print ("pdf_document.pdf has %d pages." % input1.getNumPages()) # print page content page_content = input1.getPage(0).extractText() print( page_content ) # add page 1 from input1 to output document, unchanged output.addPage(input1.getPage(0)) # add page 2 from input1, but rotated clockwise 90 degrees output.addPage(input1.getPage(1).rotateClockwise(90)) # finally, write "output" to document-output.pdf outputStream = open("PyPDF2-output.pdf", "wb") output.write(outputStream)
But there are a lot of problems with PyPDF getting PDF content, and you can see the list of problems. There are also instructions in the documentation.
| extractText(self) | ## | # Locate all text drawing commands, in the order they are provided in the | # content stream, and extract the text. This works well for some PDF | # files, but poorly for others, depending on the generator used. This will | # be refined in the future. Do not rely on the order of text coming out of | # this function, as it will change if this function is made more | # sophisticated. | # | # Stability: Added in v1.7, will exist for all future v1.x releases. May | # be overhauled to provide more ordered text in the future. | # @return a unicode string object