söndag 25 december 2011

How to convert a scanned image text to text

Use open source Optical Character Recognition OCR Tesseract. This use Leptonica Image Processing Library to read a wide range of image formats.
See this blog about how to use in Android or on a Cloud.
Tresseract is hosted on Google and you can find it at http://code.google.com/p/tesseract-ocr/
This is among 3 best engines in accuracy test done in 1990 and supposed to be still accurate.
The challenge to make a Cloud version of this.

Inga kommentarer:

Skicka en kommentar