How to implement and do OCR in a C project

Question

I ve been searching for a while and all that i ve seen some OCR library requests  I would like to know how to implement the purest  easy to install and use OCR library with detailed info for installation into a C  project    If posible  I just wanna implement it like a usual dll reference     Example   using org pdfbox pdmodel  using org pdfbox util    Also a little OCR code example would be nice  such as    public string OCRFromBitmap Bitmap Bmp        Bmp Save temppath  System Drawing Imaging ImageFormat Tiff       string OcrResult   Analyze temppath       File Delete temppath       return OcrResult      So please consider that I m not familiar to OCR projects and give me an answer like talking to a dummy   Edit   I guess people misunderstood my request  I wanted to know how to implement those open source OCR libraries to a C  project and how to use them  The link given as dup is not giving answers that I requested at all

User · Answer

Some online API s work pretty well  ocr space and Google Cloud Vision  Both of these are free  as long as you do less than 1000 OCR s per month  You can drag  amp  drop an image to do a quick manual test to see how they perform for your images   I find OCR space easier to use  no messing around with nuget libraries   but  for my purpose  Google Cloud Vision provided slightly better results than OCR space   Google Cloud Vision example   GoogleCredential cred   GoogleCredential FromJson json   Channel channel   new Channel ImageAnnotatorClient DefaultEndpoint Host  ImageAnnotatorClient DefaultEndpoint Port  cred ToChannelCredentials     ImageAnnotatorClient client   ImageAnnotatorClient Create channel   Image image   Image FromStream stream    EntityAnnotation googleOcrText   client DetectText image  First    Console Write googleOcrText Description     OCR space example   string uri     https   api ocr space parse imageurl apikey helloworld amp url  imageUri    string responseString   WebUtilities DoGetRequest uri   OcrSpaceResult result   JsonConvert DeserializeObject lt OcrSpaceResult gt  responseString   if    result IsErroredOnProcessing   amp  amp   String IsNullOrEmpty result ParsedResults 0  ParsedText     return result ParsedResults 0  ParsedText

User · Answer

If anyone is looking into this  I ve been trying different options and the following approach yields very good results   The following are the steps to get a working example    Add  NET Wrapper for tesseract to your project   It can be added via NuGet package Install-Package Tesseract https   github com charlesw tesseract   Go to the Downloads section of the official Tesseract project  https   code google com p tesseract-ocr   EDIT   It s now located here  https   github com tesseract-ocr langdata   Download the preferred language data  example  tesseract-ocr-3 02 eng tar gz    English language data for Tesseract 3 02  Create tessdata directory in your project and place the language data files in it  Go to Properties of the newly added files and set them to copy on build  Add a reference to System Drawing  From  NET Wrapper repository  in the Samples directory copy the sample phototest tif file into your project directory and set it to copy on build  Create the following two files in your project  just to get started     Program cs  using System  using Tesseract  using System Diagnostics   namespace ConsoleApplication       class Program               public static void Main string   args                        var testImagePath      phototest tif               if  args Length  gt  0                                testImagePath   args 0                              try                               var logger   new FormattedConsoleLogger                    var resultPrinter   new ResultPrinter logger                   using  var engine   new TesseractEngine     tessdata    eng   EngineMode Default                                         using  var img   Pix LoadFromFile testImagePath                                                 using  logger Begin  Process image                                                          var i   1                              using  var page   engine Process img                                                                 var text   page GetText                                    logger Log  Text   0    text                                   logger Log  Mean confidence   0    page GetMeanConfidence                                      using  var iter   page GetIterator                                                                          iter Begin                                        do                                                                               if  i   2    0                                                                                        using  logger Begin  Line  0    i                                                                                                 do                                                                                                       using  logger Begin  Word Iteration                                                                                                                  if  iter IsAtBeginningOf PageIteratorLevel Block                                                                                                                         logger Log  New block                                                                                                                      if  iter IsAtBeginningOf PageIteratorLevel Para                                                                                                                         logger Log  New paragraph                                                                                                                      if  iter IsAtBeginningOf PageIteratorLevel TextLine                                                                                                                         logger Log  New line                                                                                                                      logger Log  word      iter GetText PageIteratorLevel Word                                                                                                            while  iter Next PageIteratorLevel TextLine  PageIteratorLevel Word                                                                                                                                    i                                          while  iter Next PageIteratorLevel Para  PageIteratorLevel TextLine                                                                                                                                                                catch  Exception e                                Trace TraceError e ToString                     Console WriteLine  Unexpected Error      e Message                   Console WriteLine  Details                      Console WriteLine e ToString                               Console Write  Press any key to continue                       Console ReadKey true                        private class ResultPrinter                       readonly FormattedConsoleLogger logger               public ResultPrinter FormattedConsoleLogger logger                                this logger   logger                             public void Print ResultIterator iter                                logger Log  Is beginning of block   0    iter IsAtBeginningOf PageIteratorLevel Block                    logger Log  Is beginning of para   0    iter IsAtBeginningOf PageIteratorLevel Para                    logger Log  Is beginning of text line   0    iter IsAtBeginningOf PageIteratorLevel TextLine                    logger Log  Is beginning of word   0    iter IsAtBeginningOf PageIteratorLevel Word                    logger Log  Is beginning of symbol   0    iter IsAtBeginningOf PageIteratorLevel Symbol                     logger Log  Block text     0      iter GetText PageIteratorLevel Block                    logger Log  Para text     0      iter GetText PageIteratorLevel Para                    logger Log  TextLine text     0      iter GetText PageIteratorLevel TextLine                    logger Log  Word text     0      iter GetText PageIteratorLevel Word                    logger Log  Symbol text     0      iter GetText PageIteratorLevel Symbol                                      FormattedConsoleLogger cs  using System  using System Collections Generic  using System Text  using Tesseract   namespace ConsoleApplication       public class FormattedConsoleLogger               const string Tab                   private class Scope   DisposableBase                       private int indentLevel              private string indent              private FormattedConsoleLogger container               public Scope FormattedConsoleLogger container  int indentLevel                                this container   container                  this indentLevel   indentLevel                  StringBuilder indent   new StringBuilder                    for  int i   0  i  lt  indentLevel  i                                          indent Append Tab                                     this indent   indent ToString                               public void Log string format  object   args                                var message   String Format format  args                   StringBuilder indentedMessage   new StringBuilder message Length   indent Length   10                   int i   0                  bool isNewLine   true                  while  i  lt  message Length                                        if  message Length  gt  i  amp  amp  message i       r   amp  amp  message i   1       n                                                 indentedMessage AppendLine                            isNewLine   true                          i    2                                            else if  message i       r     message i       n                                                 indentedMessage AppendLine                            isNewLine   true                          i                                              else                                               if  isNewLine                                                        indentedMessage Append indent                               isNewLine   false                                                    indentedMessage Append message i                            i                                                             Console WriteLine indentedMessage ToString                                 public Scope Begin                                 return new Scope container  indentLevel   1                              protected override void Dispose bool disposing                                if  disposing                                        var scope   container scopes Pop                        if  scope    this                                                throw new InvalidOperationException  Format scope removed out of order                                                                              private Stack lt Scope gt  scopes   new Stack lt Scope gt              public IDisposable Begin string title       params object   args                        Log title  args               Scope scope              if  scopes Count    0                                scope   new Scope this  1                             else                               scope   ActiveScope Begin                              scopes Push scope               return scope                     public void Log string format  params object   args                        if  scopes Count  gt  0                                ActiveScope Log format  args                             else                               Console WriteLine String Format format  args                                     private Scope ActiveScope                       get                               var top   scopes Peek                    if  top    null  throw new InvalidOperationException  No current scope                    return top

User · Answer

Here s one   check out http   hongouru blogspot ie 2011 09 c-ocr-optical-character-recognition html or http   www codeproject com Articles 41709 How-To-Use-Office-2007-OCR-Using-C for more info   using MODI  static void Main string   args        DocumentClass myDoc   new DocumentClass        myDoc Create   theDocumentName tiff      we work with the  tiff extension     myDoc OCR MiLANGUAGES miLANG ENGLISH  true  true        foreach  Image anImage in myDoc Images                Console WriteLine anImage Layout Text     here we cout to the console

User · Answer

I m using tesseract OCR engine with TessNet2  a C  wrapper - http   www pixel-technology com freeware tessnet2     Some basic code    using tessnet2         Bitmap image   new Bitmap   u  user files bwalker 2849257 tif                tessnet2 Tesseract ocr   new tessnet2 Tesseract                ocr SetVariable  tessedit char whitelist    0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz   -   amp                Accepted characters             ocr Init   C  Users bwalker Documents Visual Studio 2010 Projects tessnetWinForms tessnetWinForms bin Release     eng   false      Directory of your tessdata folder             List lt tessnet2 Word gt  result   ocr DoOCR image  System Drawing Rectangle Empty               string Results                   foreach  tessnet2 Word word in result                                Results    word Confidence          word Text          word Left          word Top          word Bottom          word Right     n

[c#] How to implement and do OCR in a C# project?

Examples related to c#

Examples related to ocr