Since I am transferred to NOIDA office of the company there is mostly office work for me except occasional visit to a power station.
I was looking for OCR solution to convert scanned PDF documents to text files. Initially I tried pdfocr and tesseract command line tools but not much success.
Then I converted one page online at ABBYY FineReader The site allowed me to convert 3 pages for free and afterwards I had to pay.
I discovered this page about Linux OCR solution. I downloaded the .deb file and installed on Ubuntu 12.04. It installed without any dependency problem since I had tesseract already installed.
Actually Lios is a GUI using cuneiform/tesseract engine in the background. I had already tried pdfocr which uses cuneiform and tesseract through command line and not hoping to get good results but Lios worked much better.
I used cuneiform engine for normal scanned page and tesseract engine if there was a table on the page. It takes time if there is a table but tesseract extracts the text correctly.
I was looking for OCR solution to convert scanned PDF documents to text files. Initially I tried pdfocr and tesseract command line tools but not much success.
Then I converted one page online at ABBYY FineReader The site allowed me to convert 3 pages for free and afterwards I had to pay.
I discovered this page about Linux OCR solution. I downloaded the .deb file and installed on Ubuntu 12.04. It installed without any dependency problem since I had tesseract already installed.
Actually Lios is a GUI using cuneiform/tesseract engine in the background. I had already tried pdfocr which uses cuneiform and tesseract through command line and not hoping to get good results but Lios worked much better.
I used cuneiform engine for normal scanned page and tesseract engine if there was a table on the page. It takes time if there is a table but tesseract extracts the text correctly.
No comments:
Post a Comment