Tesseract running error

Question

I have a problem with running tesseract-ocr engine on linux  I ve downloaded RUS language data and put it to tessdata directory   usr local share tessdata   When I m trying to run tesseract with command tesseract blob jpg out -l rus   it displays an error   Error opening data file  usr local share tessdata eng traineddata  Please make sure the TESSDATA PREFIX environment variable is set to the parent directory of your  tessdata  directory   Failed loading language eng Tesseract couldn t load any languages   Could not initialize tesseract    According to compiling guide  I used export TESSDATA PREFIX   usr local share    to point my tessdata directory  Maybe I should edit any config files  Tesseract try to load  eng  data files instead of  rus     Screenshot  http   i stack imgur com I0Guc png

User · Answer

tessdata dir config   r --tessdata-dir  quot  usr local Cellar tesseract 4 1 1 share tessdata quot   pytesseract image to string imgCrop lang  eng  config tessdata dir config

User · Answer

Add this to your code   instance setDatapath  quot C   somepath  tessdata quot     instance setLanguage  quot eng quot

User · Answer

I m using Visual Studio 2017 Community Edition   I solved this problem by making a directory called tessdata in the Debug directory of my project  Then I put the eng traineddata file into said directory

User · Answer

You can grab eng traineddata Github   wget https   github com tesseract-ocr tessdata raw master eng traineddata   Check https   github com tesseract-ocr tessdata for a full list of trained language data   When you grab the file s   move them to the  usr local share tessdata folder  Warning  some Linux distributions  such as openSUSE and Ubuntu  may be expecting it in  usr share tessdata instead     If you got the data from Google  unzip it first  gunzip eng traineddata gz    Move the data sudo mv -v eng traineddata  usr local share tessdata

User · Answer

C  developer working on Windows here  What works for me is simply download the file eng traineddata from the following URL   https   github com tesseract-ocr tessdata blob master eng traineddata  and copy it to the following directory in my Console Application project    Project Directory  bin Debug tessdata  I did manually create the tessdata folder above

User · Answer

I m using windows OS  I tried all solutions above and none of them work    Finally  I install Tesseract-OCR on D drive Where I run my python script from  instead of C drive and it works   So  if you are using windows  run your python script in the same drive as your Tesseract-OCR

User · Answer

The simpliest way is to install the needed package   sudo apt-get install tesseract-ocr-eng   for english sudo apt-get install tesseract-ocr-tam   for tamil sudo apt-get install tesseract-ocr-deu   for deutsch  German    As you can notice  it opens the road to others languages  i e  tesseract-ocr-fra

User · Answer

tesseract  --tessdata-dir  lt tessdata-folder gt   lt image-path gt  stdout --oem 2 -l  lt lng gt   In my case  the mistakes that I ve made or attempts that wasn t a success   I cloned the github repo and copied files from there to   usr local share tessdata   usr share tesseract-ocr tessdata   usr share tessdata    Used TESSDATA PREFIX with above paths sudo apt-get install tesseract-ocr-eng  First 2 attempts did not worked because  the files from git clone did not worked for the reasons that I do not know  I am not sure why  3 attempt worked for me  Finally   I downloaded the eng traindata file using wget Copied it to some directory Used --tessdata-dir with directory name  Take away for me is to learn the tool well  amp  make use of it  rather than relying on package manager installation  amp  directories

User · Answer

I had this error too on the Windows machine   My solution   1  Download your language files from  https   github com tesseract-ocr tessdata tree 3 04 00  For example  for eng  I downloaded all files with eng prefix   2  Put them into tessdata directory inside of some folder  Add this folder into System Path variables as TESSDATA PREFIX   Result will be System env var  TESSDATA PREFIX D  Java OCR And OCR folder has tessdata with languages files   This is a screenshot of the directory

User · Answer

No previous solution worked for me    I ve installed both by apt-get and manually downloading the tessdata  moved around  usr and so on and no one worked even if i exported the variable thousand times   Finally  on a last try before start to cry i ve tried to pass the path directly to the instance of Tesseract     In Python  tr   Tesseract   usr local share tesseract-ocr    and now it works  To clarify  im using tesserwrap module

User · Answer

For Windows Users   In Environment Variables  add a new variable in system variable with name  TESSDATA PREFIX  and value is  C  Program Files  x86  Tesseract-OCR tessdata

User · Answer

You can call tesseract API function from C code    include  lt tesseract baseapi h gt   include  lt tesseract ocrclass h gt      ETEXT DESC  using namespace tesseract   class TessAPI   public TessBaseAPI       public      void PrintRects int len           TessAPI  api   new TessAPI    int res   api- gt Init NULL   rus    api- gt SetAccuracyVSpeed AVS MOST ACCURATE   api- gt SetImage data  w0  h0  bpp  stride   api- gt SetRectangle x0 y0 w0 h0    char  text  ETEXT DESC monitor  api- gt RecognizeForChopTest  amp monitor   text   api- gt GetUTF8Text    printf  text   s n   text   printf  m count   s n   monitor count   printf  m progress   s n   monitor progress    api- gt RecognizeForChopTest  amp monitor   text   api- gt GetUTF8Text    printf  text   s n   text       api- gt End      And build this code   g   -g -I  -I usr local include -o  test test cpp -ltesseract api -lfreeimageplus    i need FreeImage for picture loading

[ocr] Tesseract running error

Examples related to ocr

Examples related to tesseract