Python 3.6 Use the correct method of ES1en ES2en

  • 2020-12-20 03:40:59
  • OfStack

Tesseract introduction

tesseract is a very good OCR engine. The current problem is that there is relatively little up-to-date Chinese material and too much outdated and inaccurate information.

tesseract is 1 google support of open source ocr project, the project address: https: / / github com/tesseract - ocr/tesseract, at present the latest source code can be downloaded here.

There are two ways to actually use tesseract ocr: 1. Dynamic library libtesseract 2. Execution procedure tesseract.exe

The environment

Python 3.6.3 pip 9.0.1 tesseract-ocr-setup-3.05.00dev.exe Windows10

The installation

1.tesseract-orc

Tesseract: An open source OCR identification engine. The Tesseract engine was initially developed by HP Laboratory, and later contributed to the open source software industry. Later, it was improved by Google, eliminated bug, optimized and re-released.

When installing, you need to choose the language of installation by yourself. For some other languages, you can choose not to install. I have installed Chinese, English and Japanese. Installation process and other software 1.

2.pytesseract

pip install pytesseract

Configure the environment

1. Set the PATH to ES53en-ES54en

By default ES57en-ES58en is not added to the system's path path, which occurs when used FileNotFoundError: [WinError 2] The system could not find the specified file error.

Solutions:

Method 1: Add C:\Program Files (x86)\ ES68en-ES69en to the system path (the path varies with the installation process) Method 2: Modify the pytesseract.py file

Sets the location of the training set

The default training set for the download is also not added to the system path and will report an error


pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')

Solutions:

Setting environment variables TESSDATA_PREFIX
C:\Program Files (x86)\Tesseract-OCR\tessdata

Examples of application


import pytesseract
from PIL import Image
image = Image.open('test.png')
code = pytesseract.image_to_string(image)
print(code)

More reference: https: / / pypi python. org/pypi/pytesseract

conclusion


Related articles: