Thursday, March 28, 2013

Ubuntu OCR Solution.

Since I am transferred to NOIDA office of the company there is mostly office work for me except occasional visit to a power station.

I was looking for OCR solution to convert scanned PDF documents to text files. Initially I tried pdfocr and tesseract command line tools but not much success.

Then I converted one page online at ABBYY FineReader The site allowed me to convert 3 pages for free and afterwards I had to pay.

I discovered this page about Linux OCR solution. I downloaded the .deb file and installed on Ubuntu 12.04. It installed without any dependency problem since I had tesseract already installed.

Actually Lios is a GUI using cuneiform/tesseract engine in the background. I had already tried pdfocr which uses cuneiform and tesseract through command line and not hoping to get good results but Lios worked much better.

I used cuneiform engine for normal scanned page and tesseract engine if there was a table on the page. It takes time if there is a table but tesseract extracts the text correctly.

Wifi range extender at our new home in Salaiya, Bhopal

After selling Ankur Apartment flat at Delhi we permanently shifted to Bhopal in our new 3 BHK Flat at Canal Kinship, Salaiya, Bhopal wef 28 ...