Python2.7 Sample method for reading PDF files
- 2020-06-12 09:51:18
- OfStack
This article illustrates how Python2.7 reads PDF files. To share for your reference, specific as follows:
Example code USES this article Python version is 2.7, need to download the plugin is PDFMiner, download address is http: / / www unixuser. org / ~ euske/python pdfminer/address in installation method, I am no longer in detail, to be sure Python2 can only use PDFMiner Python3 unusable, PDFMiner3K Python3 can use, Download address for https: / / pypi python. org/pypi pdfminer3k /. The use of the two plug-ins is broadly similar, and Here I use Python2 as an example, using the PDFMiner plug-in. The code is as follows:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFTextExtractionNotAllowed
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfdevice import PDFDevice
from pdfminer.layout import LAParams
from pdfminer.converter import PDFPageAggregator
# Get the document object that you put in algorithm.pdf Just change the name of your file.
fp=open("algorithm.pdf","rb")
# create 1 Six interpreters are associated with the document
parser=PDFParser(fp)
#PDF The document object
doc=PDFDocument(parser)
# Link interpreter and document object
parser.set_document(doc)
#doc.set_paeser(parser)
# Initialization document
#doc.initialize("")
# create PDF Resource manager
resource=PDFResourceManager()
# Parametric analyzer
laparam=LAParams()
# create 1 An aggregator
device=PDFPageAggregator(resource,laparams=laparam)
# create PDF Page interpreter
interpreter=PDFPageInterpreter(resource,device)
# Use the document object to get a collection of pages
for page in PDFPage.create_pages(doc):
# Use the page interpreter to read
interpreter.process_page(page)
# Use an aggregator to get the content
layout=device.get_result()
for out in layout:
if hasattr(out, "get_text"):
print out.get_text()
For more information about Python, please refer to Python Files and Directories, Python Data Structures and Algorithms, Python Functions, Python String Manipulation, and Python Introductory and Advanced Classic.
I hope this article has been helpful in Python programming.