[image-processing] image processing to improve tesseract OCR accuracy

I am by no means an OCR expert. But I this week had need to convert text out of a jpg.

I started with a colorized, RGB 445x747 pixel jpg. I immediately tried tesseract on this, and the program converted almost nothing. I then went into GIMP and did the following. image>mode>grayscale image>scale image>1191x2000 pixels filters>enhance>unsharp mask with values of radius = 6.8, amount = 2.69, threshold = 0 I then saved as a new jpg at 100% quality.

Tesseract then was able to extract all the text into a .txt file

Gimp is your friend.

Examples related to image-processing

Convert np.array of type float64 to type uint8 scaling values dlib installation on Windows 10 OpenCV - Saving images to a particular folder of choice How do I increase the contrast of an image in Python OpenCV OpenCV & Python - Image too big to display TypeError: Image data can not convert to float Extracting text OpenCV c++ and opencv get and set pixel color to Mat cv2.imshow command doesn't work properly in opencv-python How does one convert a grayscale image to RGB in OpenCV (Python)?

Examples related to ocr

best OCR (Optical character recognition) example in android Tesseract OCR simple example Tesseract running error How to implement and do OCR in a C# project? image processing to improve tesseract OCR accuracy Simple Digit Recognition OCR in OpenCV-Python How to make tesseract to recognize only numbers, when they are mixed with letters? Java OCR implementation Is there any free OCR library for Android? How to recognize vehicle license / number plate (ANPR) from an image?

Examples related to tesseract

Pytesseract : "TesseractNotFound Error: tesseract is not installed or it's not in your path", how do I fix this? How do I resolve a TesseractNotFoundError? best OCR (Optical character recognition) example in android Tesseract OCR simple example Tesseract running error image processing to improve tesseract OCR accuracy How to make tesseract to recognize only numbers, when they are mixed with letters?