tesseract-ocr - open source OCR engine released as an open source software and has been sponsored by Google since 2006. Tesseract is a highly portable software library. It uses the Leptonica image-processing library to generate a binary image by doing adaptive thresholding on a gray or colored image. http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/ in Programování > Librarieswith ocr