Tech Thursday: How to Get Great OCR Results (Even From Poor-quality Image-based Documents!)

Every organisation that has jumped on the ‘green’ bandwagon wants to go paperless.

Despite this worthy goal, true ‘paperlessness’ is almost impossible to achieve. In every company in every country around the world, millions of documents are printed, faxed, scanned, re-printed, re-scanned…and the story goes on and on. Most of us are drowning in paper and we seem to be OK with it.

Of course, this paper Armageddon is not the only problem.

Loss of readability is another.

Every time an image-based document is printed, faxed, scanned or copied (Xeroxed), it results in noisy, often unreadable images. And when even humans struggle to decipher something in a document, it’s foolish to hope that technology will give accurate OCR output. As the old cliché goes, garbage in, garbage out!

Thankfully, it’s not all doom and gloom. Thanks to advancements in computer vision, particularly the development of ‘adaptive noise removal algorithms’, it is now possible to get good OCR results, even from under-par image-based documents.

Older ‘despeckling’ OCR algorithms blindly removed specks from a document, resulting in almost-unreadable output. Newer adaptive noise removal algorithms work differently. They automatically detect specks on a page and make on-the-fly adjustments; thus removing extreme noise from some parts of a page without destroying content in other parts. They can also remove background shading and ‘clean’ up documents for more readable text and successful OCR operations.

Mobile devices also cause several document processing challenges. Documents photographed with Smartphones are rarely flat. OCR engines look for properly-oriented characters, not text that ‘bows’ across a page. Such images obviously lead to sub-optimal results. Sometimes the camera is not in the right position to capture a good image. Occasionally, shadows are cast on the page from the phone or the photographer. These shadows confuse the algorithms used to convert colour images to black-and-white for recognition. All these problems affect output quality.

Computer vision solves these problems as well.

Software with 3D deskewing algorithms can straighten out the lines of text in an image. They can deal with curved content as well as simple rotations to remove distortions. They can also correct for parallax distortion (the perspective effect) commonly found in camera images. When combined with advanced binarisation algorithms, 3D deskewing algorithms can resolve localised shading problems without any negative effect.

Exceptional OCR accuracy with Kofax OmniPage

One company provides a solution that can not only handle the types of document issues described above, but also deliver exceptional accuracy and quality.

The solution is OmniPage by Kofax.

Kofax OmniPage offers world-class OCR for fast, easy and most importantly, accurate document conversion. With remarkable conversion accuracy and intelligent character recognition capability;  OmniPage allows you to instantly turn paper and digital documents into editable, searchable and shareable files.

When you can convert paper, PDFs and images into valuable digital files without hassle (or pain!), you can increase your firm’s productivity and focus on what really matters.

Want to know more about OmniPage’s powerful document conversion capabilities? Contact us today!

Prime Infotech is a leading reseller of innovative technology solutions like Kofax OmniPage that are trusted by firms all over APAC. To know more about these products, volume discounts and free trials, get in touch!

Phone: 022-2308-0666, +91-9833650378

Email: salesdesk@indiaprime.com

Spread the love

Leave a Comment