Extracting text OpenCV

Question

I am trying to find the bounding boxes of text in an image and am currently using this approach      calculate the local variances of the grayscale image Mat t mean  t mean 2  Mat grayF  outImg gray convertTo grayF  CV 32F   int winSize   35  blur grayF  t mean  cv  Size winSize winSize    blur grayF mul grayF   t mean 2  cv  Size winSize winSize    Mat varMat   t mean 2 - t mean mul t mean   varMat convertTo varMat  CV 8U       threshold the high variance regions Mat varMatRegions   varMat  gt  100    When given an image like this     Then when I show varMatRegions I get this image     As you can see it somewhat combines the left block of text with the header of the card  for most cards this method works great but on busier cards it can cause problems    The reason it is bad for those contours to connect is that it makes the bounding box of the contour nearly take up the entire card   Can anyone suggest a different way I can find the text to ensure proper detection of text   200 points to whoever can find the text in the card above the these two

User · Answer

This is a C  version of the answer from dhanushka using OpenCVSharp          Mat large   new Mat INPUT FILE           Mat rgb   new Mat    small   new Mat    grad   new Mat    bw   new Mat    connected   new Mat                downsample and use it for processing         Cv2 PyrDown large  rgb           Cv2 CvtColor rgb  small  ColorConversionCodes BGR2GRAY               morphological gradient         var morphKernel   Cv2 GetStructuringElement MorphShapes Ellipse  new OpenCvSharp Size 3  3            Cv2 MorphologyEx small  grad  MorphTypes Gradient  morphKernel               binarize         Cv2 Threshold grad  bw  0  255  ThresholdTypes Binary   ThresholdTypes Otsu               connect horizontally oriented regions         morphKernel   Cv2 GetStructuringElement MorphShapes Rect  new OpenCvSharp Size 9  1            Cv2 MorphologyEx bw  connected  MorphTypes Close  morphKernel               find contours         var mask   new Mat Mat Zeros bw Size    MatType CV 8UC1   Range All           Cv2 FindContours connected  out OpenCvSharp Point     contours  out HierarchyIndex   hierarchy  RetrievalModes CComp  ContourApproximationModes ApproxSimple  new OpenCvSharp Point 0  0                filter contours         var idx   0          foreach  var hierarchyItem in hierarchy                        idx   hierarchyItem Next              if  idx  lt  0                  break              OpenCvSharp Rect rect   Cv2 BoundingRect contours idx                var maskROI   new Mat mask  rect               maskROI SetTo new Scalar 0  0  0                    fill the contour             Cv2 DrawContours mask  contours  idx  Scalar White  -1                   ratio of non-zero pixels in the filled region             double r    double Cv2 CountNonZero maskROI     rect Width   rect Height               if  r  gt   45    assume at least 45  of the area is filled if it contains text                      amp  amp               rect Height  gt  8  amp  amp  rect Width  gt  8     constraints on region size                   these two conditions alone are not very robust  better to use something              like the number of significant peaks in a horizontal projection as a third condition                                                Cv2 Rectangle rgb  rect  new Scalar 0  255  0   2                                    rgb SaveImage Path Combine AppDomain CurrentDomain BaseDirectory   rgb jpg

User · Answer

You can detect text by finding close edge elements  inspired from a LPD     include  opencv2 opencv hpp   std  vector lt cv  Rect gt  detectLetters cv  Mat img        std  vector lt cv  Rect gt  boundRect      cv  Mat img gray  img sobel  img threshold  element      cvtColor img  img gray  CV BGR2GRAY       cv  Sobel img gray  img sobel  CV 8U  1  0  3  1  0  cv  BORDER DEFAULT       cv  threshold img sobel  img threshold  0  255  CV THRESH OTSU CV THRESH BINARY       element   getStructuringElement cv  MORPH RECT  cv  Size 17  3         cv  morphologyEx img threshold  img threshold  CV MOP CLOSE  element     Does the trick     std  vector lt  std  vector lt  cv  Point gt   gt  contours      cv  findContours img threshold  contours  0  1        std  vector lt std  vector lt cv  Point gt   gt  contours poly  contours size          for  int i   0  i  lt  contours size    i             if  contours i  size   gt 100                         cv  approxPolyDP  cv  Mat contours i    contours poly i   3  true                cv  Rect appRect  boundingRect  cv  Mat contours poly i                   if  appRect width gt appRect height                   boundRect push back appRect                 return boundRect      Usage   int main int argc char   argv          Read     cv  Mat img1 cv  imread  side 1 jpg        cv  Mat img2 cv  imread  side 2 jpg          Detect     std  vector lt cv  Rect gt  letterBBoxes1 detectLetters img1       std  vector lt cv  Rect gt  letterBBoxes2 detectLetters img2         Display     for int i 0  i lt  letterBBoxes1 size    i            cv  rectangle img1 letterBBoxes1 i  cv  Scalar 0 255 0  3 8 0       cv  imwrite   imgOut1 jpg   img1         for int i 0  i lt  letterBBoxes2 size    i            cv  rectangle img2 letterBBoxes2 i  cv  Scalar 0 255 0  3 8 0       cv  imwrite   imgOut2 jpg   img2         return 0      Results   a  element   getStructuringElement cv  MORPH RECT  cv  Size 17  3        b  element   getStructuringElement cv  MORPH RECT  cv  Size 30  30        Results are similar for the other image mentioned

User · Answer

this is a VB NET version of the answer from dhanushka using EmguCV   A few functions and structures in EmguCV need different consideration than the C  version  with OpenCVSharp  Imports Emgu CV Imports Emgu CV Structure Imports Emgu CV CvEnum Imports Emgu CV Util          Dim input file As String    C  your input image png          Dim large As Mat   New Mat input file          Dim rgb As New Mat         Dim small As New Mat         Dim grad As New Mat         Dim bw As New Mat         Dim connected As New Mat         Dim morphanchor As New Point 0  0              downsample and use it for processing         CvInvoke PyrDown large  rgb          CvInvoke CvtColor rgb  small  ColorConversion Bgr2Gray              morphological gradient         Dim morphKernel As Mat   CvInvoke GetStructuringElement ElementShape Ellipse  New Size 3  3   morphanchor          CvInvoke MorphologyEx small  grad  MorphOp Gradient  morphKernel  New Point 0  0   1  BorderType Isolated  New MCvScalar 0                binarize         CvInvoke Threshold grad  bw  0  255  ThresholdType Binary Or ThresholdType Otsu               connect horizontally oriented regions         morphKernel   CvInvoke GetStructuringElement ElementShape Rectangle  New Size 9  1   morphanchor          CvInvoke MorphologyEx bw  connected  MorphOp Close  morphKernel  morphanchor  1  BorderType Isolated  New MCvScalar 0                find contours         Dim mask As Mat   Mat Zeros bw Size Height  bw Size Width  DepthType Cv8U  1      MatType CV 8UC1         Dim contours As New VectorOfVectorOfPoint         Dim hierarchy As New Mat          CvInvoke FindContours connected  contours  hierarchy  RetrType Ccomp  ChainApproxMethod ChainApproxSimple  Nothing               filter contours         Dim idx As Integer         Dim rect As Rectangle         Dim maskROI As Mat         Dim r As Double         For Each hierarchyItem In hierarchy GetData             rect   CvInvoke BoundingRectangle contours idx               maskROI   New Mat mask  rect              maskROI SetTo New MCvScalar 0  0  0                    fill the contour             CvInvoke DrawContours mask  contours  idx  New MCvScalar 255   -1                   ratio of non-zero pixels in the filled region             r   CvInvoke CountNonZero maskROI     rect Width   rect Height                   assume at least 45  of the area Is filled if it contains text                    constraints on region size                    these two conditions alone are Not very robust  better to use something               Like the number of significant peaks in a horizontal projection as a third condition                If r  gt  0 45 AndAlso rect Height  gt  8 AndAlso rect Width  gt  8 Then                  draw green rectangle                 CvInvoke Rectangle rgb  rect  New MCvScalar 0  255  0   2              End If             idx    1         Next         rgb Save IO Path Combine Application StartupPath   rgb jpg

User · Answer

Python Implementation for  dhanushka s solution   def process rgb rgb       hasText   False     gray   cv2 cvtColor rgb  cv2 COLOR BGR2GRAY      morphKernel   cv2 getStructuringElement cv2 MORPH ELLIPSE   3 3       grad   cv2 morphologyEx gray  cv2 MORPH GRADIENT  morphKernel        binarize        bw   cv2 threshold grad  0 0  255 0  cv2 THRESH BINARY   cv2 THRESH OTSU        connect horizontally oriented regions     morphKernel   cv2 getStructuringElement cv2 MORPH RECT   9  1       connected   cv2 morphologyEx bw  cv2 MORPH CLOSE  morphKernel        find contours     mask   np zeros bw shape  2   dtype  uint8         contours  hierarchy   cv2 findContours connected  cv2 RETR CCOMP  cv2 CHAIN APPROX SIMPLE        filter contours     idx   0     while idx  gt   0          x y w h   cv2 boundingRect contours idx             fill the contour         cv2 drawContours mask  contours  idx   255  255  255   cv2 FILLED            ratio of non-zero pixels in the filled region         r   cv2 contourArea contours idx    w h          if r  gt  0 45 and h  gt  5 and w  gt  5 and w  gt  h               cv2 rectangle rgb   x y    x w y h    0  255  0   2              hasText   True         idx   hierarchy 0  idx  0      return hasText  rgb

User · Answer

You can utilize a python implementation SWTloc  Full Disclosure   I am the author of this library To do that  - First and Second Image Notice that the text mode here is  lb df   which stands for Light Background Dark Foreground i e the text in this image is going to be in darker color than the background from swtloc import SWTLocalizer from swtloc utils import imgshowN  imgshow  swtl   SWTLocalizer     Stroke Width Transform swtl swttransform imgpaths  img1 jpg   text mode    lb df                     save results True  save rootpath    swtres                      minrsw   3  maxrsw   20  max angledev   np pi 3  imgshow swtl swtlabelled pruned13C     Grouping respacket swtl get grouped lookup radii multiplier 0 9  ht ratio 3 0  grouped annot bubble   respacket 2  maskviz   respacket 4  maskcomb    respacket 5     Saving the results   cv2 imwrite  img1 processed jpg   swtl swtlabelled pruned13C  imgshowN  maskcomb  grouped annot bubble   savepath  grouped img1 jpg          Third Image Notice that the text mode here is  db lf   which stands for Dark Background Light Foreground i e the text in this image is going to be in lighter color than the background from swtloc import SWTLocalizer from swtloc utils import imgshowN  imgshow  swtl   SWTLocalizer     Stroke Width Transform swtl swttransform imgpaths imgpaths 1   text mode    db lf                 save results True  save rootpath    swtres                  minrsw   3  maxrsw   20  max angledev   np pi 3  imgshow swtl swtlabelled pruned13C     Grouping respacket swtl get grouped lookup radii multiplier 0 9  ht ratio 3 0  grouped annot bubble   respacket 2  maskviz   respacket 4  maskcomb    respacket 5     Saving the results   cv2 imwrite  img1 processed jpg   swtl swtlabelled pruned13C  imgshowN  maskcomb  grouped annot bubble   savepath  grouped img1 jpg       You will also notice that the grouping done is not so accurate  to get the desired results as the images might vary  try to tune the grouping parameters in swtl get grouped   function

User · Answer

I used a gradient based method in the program below  Added the resulting images  Please note that I m using a scaled down version of the image for processing   c   version  The MIT License  MIT   Copyright  c  2014 Dhanushka Dangampola  Permission is hereby granted  free of charge  to any person obtaining a copy of this software and associated documentation files  the  Software    to deal in the Software without restriction  including without limitation the rights to use  copy  modify  merge  publish  distribute  sublicense  and or sell copies of the Software  and to permit persons to whom the Software is furnished to do so  subject to the following conditions   The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software   THE SOFTWARE IS PROVIDED  AS IS   WITHOUT WARRANTY OF ANY KIND  EXPRESS OR IMPLIED  INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM  DAMAGES OR OTHER LIABILITY  WHETHER IN AN ACTION OF CONTRACT  TORT OR OTHERWISE  ARISING FROM  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE    include  stdafx h    include  lt opencv2 core core hpp gt   include  lt opencv2 highgui highgui hpp gt   include  lt opencv2 imgproc imgproc hpp gt   include  lt iostream gt   using namespace cv  using namespace std    define INPUT FILE               1 jpg   define OUTPUT FOLDER PATH      string      int  tmain int argc   TCHAR  argv          Mat large   imread INPUT FILE       Mat rgb         downsample and use it for processing     pyrDown large  rgb       Mat small      cvtColor rgb  small  CV BGR2GRAY          morphological gradient     Mat grad      Mat morphKernel   getStructuringElement MORPH ELLIPSE  Size 3  3        morphologyEx small  grad  MORPH GRADIENT  morphKernel          binarize     Mat bw      threshold grad  bw  0 0  255 0  THRESH BINARY   THRESH OTSU          connect horizontally oriented regions     Mat connected      morphKernel   getStructuringElement MORPH RECT  Size 9  1        morphologyEx bw  connected  MORPH CLOSE  morphKernel          find contours     Mat mask   Mat  zeros bw size    CV 8UC1       vector lt vector lt Point gt  gt  contours      vector lt Vec4i gt  hierarchy      findContours connected  contours  hierarchy  CV RETR CCOMP  CV CHAIN APPROX SIMPLE  Point 0  0           filter contours     for int idx   0  idx  gt   0  idx   hierarchy idx  0                 Rect rect   boundingRect contours idx            Mat maskROI mask  rect           maskROI   Scalar 0  0  0              fill the contour         drawContours mask  contours  idx  Scalar 255  255  255   CV FILLED              ratio of non-zero pixels in the filled region         double r    double countNonZero maskROI   rect width rect height            if  r  gt   45    assume at least 45  of the area is filled if it contains text                 amp  amp                rect height  gt  8  amp  amp  rect width  gt  8     constraints on region size                   these two conditions alone are not very robust  better to use something              like the number of significant peaks in a horizontal projection as a third condition                                        rectangle rgb  rect  Scalar 0  255  0   2                       imwrite OUTPUT FOLDER PATH   string  rgb jpg    rgb        return 0      python version  The MIT License  MIT   Copyright  c  2017 Dhanushka Dangampola  Permission is hereby granted  free of charge  to any person obtaining a copy of this software and associated documentation files  the  Software    to deal in the Software without restriction  including without limitation the rights to use  copy  modify  merge  publish  distribute  sublicense  and or sell copies of the Software  and to permit persons to whom the Software is furnished to do so  subject to the following conditions   The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software   THE SOFTWARE IS PROVIDED  AS IS   WITHOUT WARRANTY OF ANY KIND  EXPRESS OR IMPLIED  INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM  DAMAGES OR OTHER LIABILITY  WHETHER IN AN ACTION OF CONTRACT  TORT OR OTHERWISE  ARISING FROM  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE   import cv2 import numpy as np  large   cv2 imread  1 jpg   rgb   cv2 pyrDown large  small   cv2 cvtColor rgb  cv2 COLOR BGR2GRAY   kernel   cv2 getStructuringElement cv2 MORPH ELLIPSE   3  3   grad   cv2 morphologyEx small  cv2 MORPH GRADIENT  kernel      bw   cv2 threshold grad  0 0  255 0  cv2 THRESH BINARY   cv2 THRESH OTSU   kernel   cv2 getStructuringElement cv2 MORPH RECT   9  1   connected   cv2 morphologyEx bw  cv2 MORPH CLOSE  kernel    using RETR EXTERNAL instead of RETR CCOMP contours  hierarchy   cv2 findContours connected copy    cv2 RETR EXTERNAL  cv2 CHAIN APPROX NONE   For opencv 3  comment the previous line and uncomment the following line     contours  hierarchy   cv2 findContours connected copy    cv2 RETR EXTERNAL  cv2 CHAIN APPROX NONE   mask   np zeros bw shape  dtype np uint8   for idx in range len contours        x  y  w  h   cv2 boundingRect contours idx       mask y y h  x x w    0     cv2 drawContours mask  contours  idx   255  255  255   -1      r   float cv2 countNonZero mask y y h  x x w       w   h       if r  gt  0 45 and w  gt  8 and h  gt  8          cv2 rectangle rgb   x  y    x w-1  y h-1    0  255  0   2   cv2 imshow  rects   rgb

User · Answer

Above Code JAVA version   Thanks  William  public static List lt Rect gt  detectLetters Mat img           List lt Rect gt  boundRect new ArrayList lt  gt          Mat img gray  new Mat    img sobel new Mat    img threshold new Mat    element new Mat        Imgproc cvtColor img  img gray  Imgproc COLOR RGB2GRAY       Imgproc Sobel img gray  img sobel  CvType CV 8U  1  0  3  1  0  Core BORDER DEFAULT         at src  Mat dst  double thresh  double maxval  int type     Imgproc threshold img sobel  img threshold  0  255  8       element Imgproc getStructuringElement Imgproc MORPH RECT  new Size 15 5        Imgproc morphologyEx img threshold  img threshold  Imgproc MORPH CLOSE  element       List lt MatOfPoint gt  contours   new ArrayList lt MatOfPoint gt         Mat hierarchy   new Mat        Imgproc findContours img threshold  contours hierarchy  0  1        List lt MatOfPoint gt  contours poly   new ArrayList lt MatOfPoint gt  contours size           for  int i   0  i  lt  contours size    i                             MatOfPoint2f  mMOP2f1 new MatOfPoint2f             MatOfPoint2f  mMOP2f2 new MatOfPoint2f              contours get i  convertTo mMOP2f1  CvType CV 32FC2            Imgproc approxPolyDP mMOP2f1  mMOP2f2  2  true             mMOP2f2 convertTo contours get i   CvType CV 32S                 Rect appRect   Imgproc boundingRect contours get i                if  appRect width gt appRect height                    boundRect add appRect                             return boundRect      And use this code in practice             System loadLibrary Core NATIVE LIBRARY NAME           Mat img1 Imgcodecs imread  abc png            List lt Rect gt  letterBBoxes1 Utils detectLetters img1            for int i 0  i lt  letterBBoxes1 size    i                Imgproc rectangle img1 letterBBoxes1 get i  br    letterBBoxes1 get i  tl   new Scalar 0 255 0  3 8 0                    Imgcodecs imwrite  abc1 png   img1

User · Answer

Here is an alternative approach that I used to detect the text blocks   Converted the image to grayscale Applied threshold  simple binary threshold  with a handpicked value of 150 as the threshold value  Applied dilation to thicken lines in image  leading to more compact objects and less white space fragments  Used a high value for number of iterations  so dilation is very heavy  13 iterations  also handpicked for optimal results   Identified contours of objects in resulted image using opencv findContours function  Drew a bounding box  rectangle  circumscribing each contoured object - each of them frames a block of text  Optionally discarded areas that are unlikely to be the object you are searching for  e g  text blocks  given their size  as the algorithm above can also find intersecting or nested objects  like the entire top area for the first card  some of which could be uninteresting for your purposes   Below is the code written in python with pyopencv  it should easy to port to C    import cv2  image   cv2 imread  quot card png quot   gray   cv2 cvtColor image cv2 COLOR BGR2GRAY    grayscale   thresh   cv2 threshold gray 150 255 cv2 THRESH BINARY INV    threshold kernel   cv2 getStructuringElement cv2 MORPH CROSS  3 3   dilated   cv2 dilate thresh kernel iterations   13    dilate    contours  hierarchy   cv2 findContours dilated cv2 RETR EXTERNAL cv2 CHAIN APPROX NONE    get contours    for each contour found  draw a rectangle around it on original image for contour in contours        get rectangle bounding contour      x y w h    cv2 boundingRect contour         discard areas that are too large     if h gt 300 and w gt 300          continue        discard areas that are too small     if h lt 40 or w lt 40          continue        draw rectangle around contour on original image     cv2 rectangle image  x y   x w y h   255 0 255  2     write original image with added contours to disk   cv2 imwrite  quot contoured jpg quot   image    The original image is the first image in your post  After preprocessing  grayscale  threshold and dilate - so after step 3  the image looked like this   Below is the resulted image   quot contoured jpg quot  in the last line   the final bounding boxes for the objects in the image look like this   You can see the text block on the left is detected as a separate block  delimited from its surroundings  Using the same script with the same parameters  except for thresholding type that was changed for the second image like described below   here are the results for the other 2 cards    Tuning the parameters The parameters  threshold value  dilation parameters  were optimized for this image and this task  finding text blocks  and can be adjusted  if needed  for other cards images or other types of objects to be found  For thresholding  step 2   I used a black threshold  For images where text is lighter than the background  such as the second image in your post  a white threshold should be used  so replace thesholding type with cv2 THRESH BINARY   For the second image I also used a slightly higher value for the threshold  180   Varying the parameters for the threshold value and the number of iterations for dilation will result in different degrees of sensitivity in delimiting objects in the image  Finding other object types  For example  decreasing the dilation to 5 iterations in the first image gives us a more fine delimitation of objects in the image  roughly finding all words in the image  rather than text blocks    Knowing the rough size of a word  here I discarded areas that were too small  below 20 pixels width or height  or too large  above 100 pixels width or height  to ignore objects that are unlikely to be words  to get the results in the above image

User · Answer

You can try this method that is developed by Chucai Yi and Yingli Tian   They also share a software  which is based on Opencv-1 0 and it should run under Windows platform   that you can use  though no source code available   It will generate all the text bounding boxes  shown in color shadows  in the image  By applying to your sample images  you will get the following results   Note  to make the result more robust  you can further merge adjacent boxes together            Update  If your ultimate goal is to recognize the texts in the image  you can further check out gttext  which is an OCR free software and Ground Truthing tool for Color Images with Text  Source code is also available   With this  you can get recognized texts like

User · Answer

dhanushka s approach showed the most promise but I wanted to play around in Python so went ahead and translated it for fun   import cv2 import numpy as np from cv2 import boundingRect  countNonZero  cvtColor  drawContours  findContours  getStructuringElement  imread  morphologyEx  pyrDown  rectangle  threshold  large   imread image path    downsample and use it for processing rgb   pyrDown large    apply grayscale small   cvtColor rgb  cv2 COLOR BGR2GRAY    morphological gradient morph kernel   getStructuringElement cv2 MORPH ELLIPSE   3  3   grad   morphologyEx small  cv2 MORPH GRADIENT  morph kernel    binarize    bw   threshold src grad  thresh 0  maxval 255  type cv2 THRESH BINARY cv2 THRESH OTSU  morph kernel   getStructuringElement cv2 MORPH RECT   9  1     connect horizontally oriented regions connected   morphologyEx bw  cv2 MORPH CLOSE  morph kernel  mask   np zeros bw shape  np uint8    find contours im2  contours  hierarchy   findContours connected  cv2 RETR CCOMP  cv2 CHAIN APPROX SIMPLE    filter contours for idx in range 0  len hierarchy 0         rect   x  y  rect width  rect height   boundingRect contours idx         fill the contour     mask   drawContours mask  contours  idx   255  255  2555   cv2 FILLED        ratio of non-zero pixels in the filled region     r   float countNonZero mask      rect width   rect height      if r  gt  0 45 and rect height  gt  8 and rect width  gt  8          rgb   rectangle rgb   x  y rect height    x rect width  y    0 255 0  3    Now to display the image   from PIL import Image Image fromarray rgb  show     Not the most Pythonic of scripts but I tried to resemble the original C   code as closely as possible for readers to follow   It works almost as well as the original  I ll be happy to read suggestions how it could be improved fixed to resemble the original results fully

[c++] Extracting text OpenCV

Examples related to c++

Examples related to opencv

Examples related to image-processing

Examples related to text

Examples related to bounding-box