Python 3.6 Use the correct method of ES1en ES2en
- 2020-12-20 03:40:59
- OfStack
Tesseract introduction
tesseract is a very good OCR engine. The current problem is that there is relatively little up-to-date Chinese material and too much outdated and inaccurate information.
tesseract is 1 google support of open source ocr project, the project address: https: / / github com/tesseract - ocr/tesseract, at present the latest source code can be downloaded here.
There are two ways to actually use tesseract ocr: 1. Dynamic library libtesseract 2. Execution procedure tesseract.exe
The environment
Python 3.6.3 pip 9.0.1 tesseract-ocr-setup-3.05.00dev.exe Windows10The installation
1.tesseract-orc
Tesseract: An open source OCR identification engine. The Tesseract engine was initially developed by HP Laboratory, and later contributed to the open source software industry. Later, it was improved by Google, eliminated bug, optimized and re-released.When installing, you need to choose the language of installation by yourself. For some other languages, you can choose not to install. I have installed Chinese, English and Japanese. Installation process and other software 1.
2.pytesseract
pip install pytesseract
Configure the environment
1. Set the PATH to ES53en-ES54en
By default ES57en-ES58en is not added to the system's path path, which occurs when used
FileNotFoundError: [WinError 2]
The system could not find the specified file error.
Solutions:
Method 1: Add C:\Program Files (x86)\ ES68en-ES69en to the system path (the path varies with the installation process) Method 2: Modify the pytesseract.py fileSets the location of the training set
The default training set for the download is also not added to the system path and will report an error
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')
Solutions:
Setting environment variables
TESSDATA_PREFIX
C:\Program Files (x86)\Tesseract-OCR\tessdata
Examples of application
import pytesseract
from PIL import Image
image = Image.open('test.png')
code = pytesseract.image_to_string(image)
print(code)
More reference: https: / / pypi python. org/pypi/pytesseract
conclusion