The Next Wave in Intelligent Document Processing | Tech Mahindra

The Next Wave in Intelligent Document Processing

Digital document repositories and paperless business pursuits have amplified the requirements of IT systems to be able to extract and store data fields, machine-generated and handwritten text, and similar such items like scanned documents. No doubt, these systems make us more efficient, organized and save time in data retrieval. Despite this, the challenges pertaining to accuracy cannot be undermined.

In parallel, we have a world where RPA (Robotics Process Automation) is gathering critical mass and is being directed to not only automate low-value activities but also drive the next wave of IPA (Intelligent Process Automation). Of course, we understand that this will be enabled through investments in existing RPA solutions or integrating with products that provide a niche, however, when imagined as a package, it can precisely deliver near-human intelligence. With ML (Machine Learning), the capabilities are only going to match or even exceed human potential.

Customer requirements are re-shaping and inducing faster than usual changes to RPA product features including more cognitive capabilities and bridging the gap between the man and machine. One such necessity which is prevalent across verticals is the ability to deal with semi and unstructured data. We are already doing an excellent job when it comes to structured data sets but the conundrum lies around dealing with disparate data forms.

This necessitates having an efficient OCR/ICR(Optical Character Recognition/Intelligent Character Recognition) technology that solves the complete maze of document definition, classification, extraction, validation, and learning.

OCR helps read images and recognize patterns to identify underlying text and convert them into machine form, whereas ICR does the same process for a handwritten document. This obviously is a lot more complex as individual writing styles are unique while computerized fonts are standard. Hence, an element to validate and learn from users inputs is definitely required.

However, it doesn’t stop here. As we look for more use cases, we find a plethora of businesses and functions demanding more. Capability to read an OMR (Optical Mark Recognition) sheet as in the education sector or an OBR (Optical Barcode Reader) in retail &logistics, the never quenching thirst of demanding more from technology and ease one’s life is evident.

Let’s look at what businesses are expecting when it comes to digitizing and automating business processes dealing with document processing.

Ability to deal with complex semi-structured and unstructured documents

Semi structured documents like invoices, purchase orders and unstructured documents such as contracts are often considered to be best dealt by humans but newer advances are helping to classify and perform relevant data extraction. Further, NLP is redefining this unexplored area. Functions like F&A (Finance and Accounting), Procurement & SCM (Supply Chain Management) often deal with multiple formats of invoices, purchase orders, etc. and developing back-office automation for each of them kills the efficiency that intelligent process automation brings.

Ability to deal with hand written scanned documents

Machine generated images and PDF files are easy to decode but the moment it comes down to handwritten document and converting them into machine text, regular OCR engines often generate gibberish which is limiting its use in industries like banking, insurance and others where manual form filling is still prevalent.

Ability to seek manual input

Situations often requires business process automation to pause and seek user input when confidence is compromised. This is important to ensure intelligent process automation in the name of faster processing doesn’t dent the quality and accuracy. After all, automation is being used in production systems and for committing changes.

Machine Learning

Systems are expected to learn with situations that prompted them to pause and seek user input. Seeking user intervention is fine, however, seeking the same information repeatedly, every time a similar situation is encountered is not considered smart. Hence, learning and updating its ability is critical.

Ability to define rules

There would be situations where a business would want to manually sign off an event to let the bot process the transaction. For instance, allowing all invoices under a certain amount to be auto-extracted and proceeded basis a confidence score cutoff and seeking explicit approvals for invoices exceeding the threshold amount even if the confidence score is over 95%.

Having defined what businesses are looking forward to, let us also explore how solution providers are addressing this rift.

Many established players today are offering solutions to address these demands. Few have developed their own IPs and have packaged them as a part of the larger RPA solution while few are banking on niche partner solutions and providing OOB integrations to pull this off. Uipath is utilizing the Abbyy Flexi capture framework to deal with such complexities while Automation Anywhere has developed its own IQBot solution that works in tandem with its own Task bots to deliver intelligent process automation around unstructured documents. The approaches may differ but in some way these solutions attempt to address the same issue.

Let’s us now take a look on how these solutions are actually solving similar problems with respect to intelligent document processing.

Document Definitions

This is the process of teaching a machine on how to deal with semi-structured or unstructured data wherein multiple formats are fed as an input against a defined class and similar data items are extracted across the document set. This helps the machine to identify data objects basis co-ordinates, page region, labels, next to, below/top of, etc. A minimum sample of three and a recommended sample of 10 per format is enough to start with and the engine continues to improve its performance as it processes more documents day on day. Few solutions have in-built samples as well which are categorized basis nature of business and the country it belongs to. For instance, Abbyy Flexi capture has a range of invoice formats*already defined across countries.


Classification is the process of reading a scanned document and mapping it against a pre-defined template that exists in the system. In simpler terms, if three document templates A, B, C has been defined in the system, upon invoking classification it will be able to read a new document and identify whether it’s of type A, B, or C.


The process of identifying and recognizing individual data items of relevance from a document is referred to as extraction. Since the document is already defined, the engine knows where to look for what information for a specific document type. It also includes an extraction facility to read the objects and their statuses like checkboxes, radio buttons, signatures, bar codes, tables, etc.


Validation is the process of getting an explicit confirmation from a human for a data item that has been extracted using the system. It is typically applied to data items that have been extracted from handwritten text or human signatures to verify if the user also sees the information as the bot does or does it require a change. For instance, the handwritten character “a” can be written and extracted as an image in multiple ways but only a human can confirm if it’s actually an “a” or a similar-looking character. This is a powerful feature that can enable machine learning for items with a low confidence scores and thereby enrich the knowledge base over time.


While image processing of scanned documents can help extract underlying texts, the machine still struggles to understand the intent. Once a document, like a contract is completely converted to accurate text, the NLP/NLU engine can break it down into entities with specific intent which is like putting sense to a completely unstructured document. Typical clauses like warranty period, exceptions, terms of the contract, indemnity, and any other one sided clauses can be highlighted and given to a user to cross verify before signing. The process is very similar to training a Chatbot to identify a user query and accurately understand it and then answer with ease.

With RPA gradually becoming the norm in automating mundane tasks with sizeable volumes, organizations will soon run out of possible business processes to be picked as the automation candidate, but the next trend of possible back-office automation would be around dealing with semi and unstructured documents. All of this is being made possible by using AI, ML & NLP and the best of business process automation is yet to come.