[python] Simple Digit Recognition OCR in OpenCV-Python

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.

I have 100 samples (i.e. images) of each digit. I would like to train with them.

There is a sample letter_recog.py that comes with OpenCV sample. But I still couldn't figure out on how to use it. I don't understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn't understand first.

Later on searching a little bit, I could find a letter_recognition.data in cpp samples. I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing):

import numpy as np
import cv2

fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()

It gave me an array of size 20000, I don't understand what it is.

Questions:

1) What is letter_recognition.data file? How to build that file from my own data set?

2) What does results.reval() denote?

3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)?

This question is related to python opencv numpy computer-vision ocr

The answer is


OCR which stands for Optical Character Recognition is a computer vision technique used to identify the different types of handwritten digits that are used in common mathematics. To perform OCR in OpenCV we will use the KNN algorithm which detects the nearest k neighbors of a particular data point and then classifies that data point based on the class type detected for n neighbors.

Data Used


This data contains 5000 handwritten digits where there are 500 digits for every type of digit. Each digit is of 20×20 pixel dimensions. We will split the data such that 250 digits are for training and 250 digits are for testing for every class.

Below is the implementation.




import numpy as np
import cv2
   
      
# Read the image
image = cv2.imread('digits.png')
  
# gray scale conversion
gray_img = cv2.cvtColor(image,
                        cv2.COLOR_BGR2GRAY)
  
# We will divide the image
# into 5000 small dimensions 
# of size 20x20
divisions = list(np.hsplit(i,100) for i in np.vsplit(gray_img,50))
  
# Convert into Numpy array
# of size (50,100,20,20)
NP_array = np.array(divisions)
   
# Preparing train_data
# and test_data.
# Size will be (2500,20x20)
train_data = NP_array[:,:50].reshape(-1,400).astype(np.float32)
  
# Size will be (2500,20x20)
test_data = NP_array[:,50:100].reshape(-1,400).astype(np.float32)
  
# Create 10 different labels 
# for each type of digit
k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = np.repeat(k,250)[:,np.newaxis]
   
# Initiate kNN classifier
knn = cv2.ml.KNearest_create()
  
# perform training of data
knn.train(train_data,
          cv2.ml.ROW_SAMPLE, 
          train_labels)
   
# obtain the output from the
# classifier by specifying the
# number of neighbors.
ret, output ,neighbours,
distance = knn.findNearest(test_data, k = 3)
   
# Check the performance and
# accuracy of the classifier.
# Compare the output with test_labels
# to find out how many are wrong.
matched = output==test_labels
correct_OP = np.count_nonzero(matched)
   
#Calculate the accuracy.
accuracy = (correct_OP*100.0)/(output.size)
   
# Display accuracy.
print(accuracy)


Output

91.64


Well, I decided to workout myself on my question to solve the above problem. What I wanted is to implement a simple OCR using KNearest or SVM features in OpenCV. And below is what I did and how. (it is just for learning how to use KNearest for simple OCR purposes).

1) My first question was about letter_recognition.data file that comes with OpenCV samples. I wanted to know what is inside that file.

It contains a letter, along with 16 features of that letter.

And this SOF helped me to find it. These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers. (Although I didn't understand some of the features at the end)

2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.

So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)

I took the below image for my training data:

enter image description here

(I know the amount of training data is less. But, since all letters are of the same font and size, I decided to try on this).

To prepare the data for training, I made a small code in OpenCV. It does the following things:

  1. It loads the image.
  2. Selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
  3. Draws the bounding rectangle around one letter and wait for key press manually. This time we press the digit key ourselves corresponding to the letter in the box.
  4. Once the corresponding digit key is pressed, it resizes this box to 10x10 and saves all 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).
  5. Then save both the arrays in separate .txt files.

At the end of the manual classification of digits, all the digits in the training data (train.png) are labeled manually by ourselves, image will look like below:

enter image description here

Below is the code I used for the above purpose (of course, not so clean):

import sys

import numpy as np
import cv2

im = cv2.imread('pitrain.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

#################      Now finding Contours         ###################

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow('norm',im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"

np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)

Now we enter in to training and testing part.

For the testing part, I used the below image, which has the same type of letters I used for the training phase.

enter image description here

For training we do as follows:

  1. Load the .txt files we already saved earlier
  2. create an instance of the classifier we are using (it is KNearest in this case)
  3. Then we use KNearest.train function to train the data

For testing purposes, we do as follows:

  1. We load the image used for testing
  2. process the image as earlier and extract each digit using contour methods
  3. Draw a bounding box for it, then resize it to 10x10, and store its pixel values in an array as done earlier.
  4. Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognizes the correct digit.)

I included last two steps (training and testing) in single code below:

import cv2
import numpy as np

#######   training part    ############### 
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

############################# testing part  #########################

im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            roismall = roismall.reshape((1,100))
            roismall = np.float32(roismall)
            retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
            string = str(int((results[0][0])))
            cv2.putText(out,string,(x,y+h),0,1,(0,255,0))

cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)

And it worked, below is the result I got:

enter image description here


Here it worked with 100% accuracy. I assume this is because all the digits are of the same kind and the same size.

But anyway, this is a good start to go for beginners (I hope so).


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to opencv

Real time face detection OpenCV, Python OpenCV TypeError: Expected cv::UMat for argument 'src' - What is this? OpenCV !_src.empty() in function 'cvtColor' error ConvergenceWarning: Liblinear failed to converge, increase the number of iterations How do I install opencv using pip? Access IP Camera in Python OpenCV ImportError: libSM.so.6: cannot open shared object file: No such file or directory Convert np.array of type float64 to type uint8 scaling values How to import cv2 in python3? cmake error 'the source does not appear to contain CMakeLists.txt'

Examples related to numpy

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index

Examples related to computer-vision

How to predict input image using trained model in Keras? How do I increase the contrast of an image in Python OpenCV How to verify CuDNN installation? OpenCV Error: (-215)size.width>0 && size.height>0 in function imshow How to draw a rectangle around a region of interest in python How can I extract a good quality JPEG image from a video file with ffmpeg? Simple Digit Recognition OCR in OpenCV-Python OpenCV C++/Obj-C: Detecting a sheet of paper / Square Detection Converting an OpenCV Image to Black and White Combining Two Images with OpenCV

Examples related to ocr

best OCR (Optical character recognition) example in android Tesseract OCR simple example Tesseract running error How to implement and do OCR in a C# project? image processing to improve tesseract OCR accuracy Simple Digit Recognition OCR in OpenCV-Python How to make tesseract to recognize only numbers, when they are mixed with letters? Java OCR implementation Is there any free OCR library for Android? How to recognize vehicle license / number plate (ANPR) from an image?