Top 5 Python Libraries for Extracting Text from Images | by Eugenia Anello

[ad_1]

Perceive and grasp OCR instruments for textual content localization and recognition

Optical Character Recognition is an outdated, however nonetheless difficult drawback that entails the detection and recognition of textual content from unstructured knowledge, together with photos and PDF paperwork. It has cool purposes in banking, e-commerce and content material moderation in social media.

However as with every part subject in knowledge science, there’s a large quantity of assets when making an attempt to study how to remedy the OCR job. This is the reason I’m penning this tutorial, which might help you on getting began.

On this article, I’m going to indicate some Python libraries that may let you fastly extract textual content from photos with out struggling an excessive amount of. The reason of the libraries is adopted by a sensible instance. The dataset used is taken from Kaggle. To simplify the ideas, I’m simply utilizing a picture of the movie Rush.

Let’s get began!

Desk of contents:

pytesseract
EasyOCR
Keras-OCR
TrOCR
docTR

1. pytesseract

It is likely one of the hottest Python libraries for optical character recognition. It makes use of Google’s Tesseract-OCR Engine to extract textual content from photos. There are a number of languages supported. Verify here if you wish to see in case your language is supported. You simply want just a few traces of code to transform the picture into textual content:

# set up
!sudo apt set up tesseract-ocr
!pip set up pytesseractimport pytesseract
from pytesseract import Output
from PIL import Picture
import cv2
img_path1 = '00b5b88720f35a22.jpg'
textual content = pytesseract.image_to_string(img_path1,lang='eng')
print(textual content)

That is the output:

[ad_2]

Source link

Top 5 Python Libraries for Extracting Text from Images | by Eugenia Anello | Jul, 2023

AutoStore to pay Ocado $256M in patent lawsuit settlement

Fine-Tune Your Own Llama 2 Model in a Colab Notebook

Editor

Fine-Tune Your Own Llama 2 Model in a Colab Notebook

Leave a Reply Cancel reply

Browse by Category

Categories

Recommended

Top 5 Python Libraries for Extracting Text from Images | by Eugenia Anello | Jul, 2023

Perceive and grasp OCR instruments for textual content localization and recognition

Desk of contents:

1. pytesseract

AutoStore to pay Ocado $256M in patent lawsuit settlement

Fine-Tune Your Own Llama 2 Model in a Colab Notebook

Editor

Fine-Tune Your Own Llama 2 Model in a Colab Notebook

Leave a Reply Cancel reply

Browse by Category

Browse by Tags

Categories

Recommended