OCR which stands for Optical Character Recognition is a computer vision technique used to identify the different types of handwritten digits that are used in common mathematics. To perform OCR in OpenCV we will use the KNN algorithm which detects the nearest k neighbors of a particular data point and then classifies that data point based on the class type detected for n neighbors.
Data Used
This data contains 5000 handwritten digits where there are 500 digits for every type of digit. Each digit is of 20×20 pixel dimensions. We will split the data such that 250 digits are for training and 250 digits are for testing for every class.
Below is the implementation.
import numpy as np import cv2 # Read the image image = cv2.imread( 'digits.png' ) # gray scale conversion gray_img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # We will divide the image # into 5000 small dimensions # of size 20x20 divisions = list (np.hsplit(i, 100 ) for i in np.vsplit(gray_img, 50 )) # Convert into Numpy array # of size (50,100,20,20) NP_array = np.array(divisions) # Preparing train_data # and test_data. # Size will be (2500,20x20) train_data = NP_array[:,: 50 ].reshape( - 1 , 400 ).astype(np.float32) # Size will be (2500,20x20) test_data = NP_array[:, 50 : 100 ].reshape( - 1 , 400 ).astype(np.float32) # Create 10 different labels # for each type of digit k = np.arange( 10 ) train_labels = np.repeat(k, 250 )[:,np.newaxis] test_labels = np.repeat(k, 250 )[:,np.newaxis] # Initiate kNN classifier knn = cv2.ml.KNearest_create() # perform training of data knn.train(train_data, cv2.ml.ROW_SAMPLE, train_labels) # obtain the output from the # classifier by specifying the # number of neighbors. ret, output ,neighbours, distance = knn.findNearest(test_data, k = 3 ) # Check the performance and # accuracy of the classifier. # Compare the output with test_labels # to find out how many are wrong. matched = output = = test_labels correct_OP = np.count_nonzero(matched) #Calculate the accuracy. accuracy = (correct_OP * 100.0 ) / (output.size) # Display accuracy. print (accuracy) |
Output
91.64
Well, I decided to workout myself on my question to solve the above problem. What I wanted is to implement a simple OCR using KNearest or SVM features in OpenCV. And below is what I did and how. (it is just for learning how to use KNearest for simple OCR purposes).
1) My first question was about letter_recognition.data
file that comes with OpenCV samples. I wanted to know what is inside that file.
It contains a letter, along with 16 features of that letter.
And this SOF
helped me to find it. These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers
.
(Although I didn't understand some of the features at the end)
2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.
So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)
I took the below image for my training data:
(I know the amount of training data is less. But, since all letters are of the same font and size, I decided to try on this).
To prepare the data for training, I made a small code in OpenCV. It does the following things:
key press manually
. This time we press the digit key ourselves corresponding to the letter in the box..txt
files.At the end of the manual classification of digits, all the digits in the training data (train.png
) are labeled manually by ourselves, image will look like below:
Below is the code I used for the above purpose (of course, not so clean):
import sys
import numpy as np
import cv2
im = cv2.imread('pitrain.png')
im3 = im.copy()
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
################# Now finding Contours ###################
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
samples = np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
cv2.imshow('norm',im)
key = cv2.waitKey(0)
if key == 27: # (escape to quit)
sys.exit()
elif key in keys:
responses.append(int(chr(key)))
sample = roismall.reshape((1,100))
samples = np.append(samples,sample,0)
responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"
np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)
Now we enter in to training and testing part.
For the testing part, I used the below image, which has the same type of letters I used for the training phase.
For training we do as follows:
.txt
files we already saved earlierFor testing purposes, we do as follows:
I included last two steps (training and testing) in single code below:
import cv2
import numpy as np
####### training part ###############
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))
model = cv2.KNearest()
model.train(samples,responses)
############################# testing part #########################
im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
roismall = roismall.reshape((1,100))
roismall = np.float32(roismall)
retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
string = str(int((results[0][0])))
cv2.putText(out,string,(x,y+h),0,1,(0,255,0))
cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)
And it worked, below is the result I got:
Here it worked with 100% accuracy. I assume this is because all the digits are of the same kind and the same size.
But anyway, this is a good start to go for beginners (I hope so).