What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project. http://sourceforge.net/projects/capture2text/files/Capture2Text/.
BTW: Kudos to it's author for sharing such a painstaking algorithm.
Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.
If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.
P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.