How Intelligent Document Processing can improve your business
Automating your document processing can certainly save you time. A modern Intelligent Document Processing platform can revolutionize your business and build a capable foundation for future growth.
What is document data capture?
Document data capture is the process of extracting information from documents, and placing that information into a structured system. From paper ledgers, to digital spreadsheets and enterprise resource planning (ERP) platforms, document data capture is rapidly gaining popularity across industries. Although the basic concept of document data capture is simple, the tools used to carry it out are quite diverse.
To provide a deeper understanding of document data capture, let’s take a look at the approaches that are currently available. This discussion will also require us to discuss the differences between structured, unstructured, and semi-structured data — each of which plays a role in the document data capture process.
The document data capture process
Before we get into the nitty-gritty of document data capture, let’s clarify what kinds of documents use this technology. A wide range of document types can be processed with document data capture, such as:
- Paper ledgers
- ERP software
- Accounting software
- Customer relationship management (CRM) software
- …and plenty of others
While an analog or paper-based system might seem good enough for your average mom-and-pop outfit, manual recordkeeping is inefficient — and it becomes even more inefficient as your business grows. Whether you’re a multinational corporation or a small business, manual document processing can waste countless hours and resources.
Nearly every industry requires a “paper trail” for reporting and auditing — in other words, document processing is here to stay. But with document data capture, we no longer need to take the term “paper trail” literally. Business units can now use document data extraction technology that provides secure, reliable, and cost-effective digital paper trails.
As you’ll soon see, all document data capture solutions aren’t created equal. Several of these methods still require a considerable amount of manual work, despite the fact that certain aspects of the process are automated. With a variety of document data capture solutions on the market, it’s important to learn about what works for you.
Document data capture methods
Manual data entry
It’s important to note that manual data entry doesn’t just refer to handwritten documents. From paper to pdfs, manual entry is any documentation process that requires a data clerk to enter information by hand. Line after line, spreadsheet after spreadsheet, day after day. Clearly, manual data entry is a tedious spirit-draining task.
Manual documentation isn’t just bad for morale — it’s also expensive. Untold hours of manual copy-paste can lead to high employee turnover, slow recordkeeping, and lots of wasted resources. With long-term consequences like operational bottlenecks and supply chain disruption, manual data entry can drag your business down.
So, ready for automation? Let’s take a look at the basics of document data capture technology.
Optical character recognition (OCR) is the pioneer of document data capture software. OCR aims to read printed and digital documents, then to organize the text contained within them. With the arrival of OCR came the beginning of faster and more accurate document data extraction.
There are two basic types of OCR: template-based and automated. Template-based OCR requires manual maintenance and error prevention, while the automated OCR uses more complex software for a more automated procedure.
So, how exactly does OCR work? Let’s take a look.
In this approach to data capture, OCR software reads a document and captures data according to predefined rules and templates. For decades, template-based OCR was at the cutting edge of document data capture technology. Today, there are still plenty of uses for template-based solutions.
Even though templates have been used in OCR technology since the 1980s, template-based OCR has come a long way since its early days. Today’s template-based OCR can extract data with a high degree of precision. Still, template-based OCR only works when the software is reading characters in layouts it has been trained to understand. This means you must set up templates and rules for every format of the document you want to process.
In the unlikely event that all the documents you work with have the exact same layout, this is a feasible solution. In practice, even the best template-based OCR solutions require you to manually reformat some documents.
Once you have set up all your rules and templates, template-based OCR requires you to verify the accuracy of each capture. After manual verification, you can initiate your desired business procedures — which may include supplier payment approval, employee onboarding, insurance claims and payouts, customs declarations, and CRM initiatives.
Also known as “cognitive document data capture software”, a cognitive OCR platform uses innovative software to understand the information it is extracting. Applying machine learning technology, cognitive OCR can learn to recognize and capture relevant data in a variety of document layouts over time. This eliminates the need to manually set up new templates.
Without the need for manual reformatting, cognitive OCR allows you to fully automate data entry. In theory, you can even go so far as to create completely touchless document processing — if your business is comfortable with having software handle operations like approvals. In practice, it’s always a good idea to keep a human in the loop to monitor accuracy.
While an automated data capture solution may seem like an obvious upgrade for most business functions, some people still don’t trust that level of automation. Decision-makers in your organization may be a bit wary about adopting technologies like artificial intelligence (Al) and machine learning, or document processing that operates from the cloud. Even though cognitive OCR has a mountain of benefits, you might need to put some extra effort into explaining the advantages of cognitive OCR before everyone’s on board.
Structured vs. semi-structured data
Intelligent document processing aims to create structured data from unstructured and semi-structured documents. Understanding these data types is central to understanding how intelligent document processing works.
Structured documents are identical in terms of size and appearance, with information that is categorized, labeled, and positioned clearly. The classic example of structured data is a multiple-choice test which has identical fields. A basic document processor can read structured documents with template-based OCR.
However, most documents aren’t as structured as we’d like them to be. Many business documents are semi-structured, meaning they have the same basic structure but may have different layouts and content. Some contain certain constants — for example, all invoices include the date, vendor name, and total amount due. But invoices also include variables, like line items, discounts, or penalties. The location of each header field may also differ from invoice to invoice. In this case, a template-based OCR solution would require manual reformatting. However, a cognitive OCR scanning platform can process semi-structured documents, and use machine learning to improve with continued use.
Grow your business with better data capture
When you take a look at how your company extracts data, you’ll quickly find opportunities to boost productivity and cut costs. A cognitive OCR platform with automated data entry could be just what you need to help your company in achieving its strategic objectives.
The future of data capture systems: Imitating human behaviour
“Someone should have perfected document data capture a long time ago.” According to many of our customers, traditional methods of document capture just aren’t as effective as they should be. At Rossum, we’re leading the way with a new kind of data capture.
For the past few years, the Rossum team has been working to find the ideal application of Rossum’s machine vision technology. We’ve been surprised to discover not only the complexity of the problems with traditional data extraction, but also how much of an advantage the human mind has over a fixed algorithm. These discoveries led to Rossum’s unique approach to data capture.
As the people who founded Rossum, we’re your standard nerds — with several major accomplishments in machine learning, computer vision, and Al between us. In 2016, we were working together on AlphaGo and image recognition technology. But we soon realized there was an even more important problem that impacts the lives of millions of people every day. That’s when we set out to perfect inefficient document data capture technology.
Traditional data capture
It didn’t take us long to discover that while solutions for reading documents have existed for decades, machines don’t understand what they’re reading. Aside from being expensive and time-consuming, traditional data capture methods are so error-prone that they can never be fully automated.
Before Rossum, the standard approach to data capture was to apply an OCR stage to generate a text layer from a document. After that, the system recognized data fields, using either image-based templates or text-based rules.
This method works pretty well for clearly scanned documents that always have the same format with no variables, such as fixed forms or generated reports. The IRS and the postal service have been using this technology ever since it was invented. Although setting up the template-based and rule-based recognition stage can be a hassle, it’s a pretty reliable system for structured documents.
Variable documents make things complicated
Unfortunately, the traditional OCR approach breaks down badly once document variability becomes a factor. You’re likely to discover you have to redo all the work required to recognize a specific kind of document, with even a slightly altered format. Since documents are altered on a regular basis across industries, document variability is quite a common problem.
You could set up recognition for every invoice format — a time-consuming approach that many have tried. But as soon as you realize how variable documents can be, you’ll see how inefficient manual reformatting can be.
Let’s use invoices as an example. If you’re working with 60 suppliers in ten different countries, that means up to 60 distinct invoice formats — any of which can change at any time. Plus, ten different countries mean there are potentially ten different legal standards that each company’s accounting offices have to meet. On top of that, each of your 60 vendors has its own internal requirements for the types of data that invoices must include.
With a global clientele, your invoices may very well use more than one language. Traditional OCR requires a solution that recognizes each language with a set of text-based rules. Plus, if you are using image-based templates, you’ll need to ensure that every invoice a supplier submits isn’t rotated or blurred, and uses the exact same format as the one you designed your system to read.
But wait — there’s more. Traditional OCR will also give you headaches whenever there’s an invoice that contains unusual notations, or line items that your system isn’t prepared to handle. Bear in mind that this is happening with 60 different suppliers at the same time.
By now, you’re probably painfully aware of the costs and complications of a traditional OCR solution when it comes to variable documents. Like many companies, you might just decide to return to manual entry.
If you’re set on using a traditional OCR solution for variable documents, you’ll have to make compromises. In practice, you’re only going to implement document data recognition for suppliers that produce the most invoices the platform recognizes. That could range from 20% to 80% of your invoice volume1. Clearly, this falls short of being a complete solution. Moreover, you need to keep updating your system as suppliers and invoices change.
You also have to contend with the persistent concern that your traditional OCR system could crash or get corrupted without prior warning. So you need to spend time, money, and resources on preparing and implementing a contingency plan.
Every document that you want to capture data from presents the same challenges — not just invoices. A template-based OCR solution requires new templates for every single document you want to process, from sales orders to packing slips, balance sheets, or bills of lading.
Our founders at Rossum come from an AI background. When we learned about these traditional data capture methods, we were truly shocked at the inefficiency of it all. We wondered: how do so many companies still use the inefficient and costly traditional OCR method to process documents?
There are a few answers to this question. First, new approaches can be risky. Large corporations often view process improvements as potential catastrophes for the company — the bigger the organization, the bigger the consequences if new technology turns out to be a failure. Second, no new solution has smashed through the wall that traditional OCR and unintelligent algorithms hit years ago. Without any major incentives to adopt new technology, enterprises see little sense in investing heavily for marginal improvements.
To discard their existing document capture approaches, businesses need assurance that changes are truly worth the time and cost. New data capture solutions must boost effectiveness and reliability from around 80% to at least 98%. They need to make a quantum leap.
Standard OCR is not good enough, and it never will be
In case you’re still not convinced, let’s take a closer look at the critical flaws in traditional OCR recognition.
With text-based rules, some OCR templates initially give you the illusion of flexibility — just bind a data field type to the obvious label phrasings. The popularity of text-based OCR has caused many businesses to believe this is the most advanced document capture they’ll ever get.
Once again using invoices as our example documents, look at the sample images below and take a couple of minutes to try and work out some good flexible rules for the data fields.
As you can see, in addition to covering all the different phrasings, the less obvious problem is the potential for false positives. These are rules that are too universal, which would lead you to capture the wrong data. Sub-Totals become Totals. Shipping becomes VAT. False positives are everywhere.
Ultimately, traditional OCR forces you to create text-based rules for each supplier, a process known as “fingerprinting”. But fingerprinting isn’t as high-tech as it may sound. Text-based rules are just as sensitive to the scanning process as image-based templates. OCR was originally developed to digitize books and newspapers — applying the same technology to business documents can quickly lead to errors. When reading documents with a complicated layout, the system may have difficulty detecting all the text strings on a page. The moment a slight problem appears, such as too small a font, a smear, or a stamp that covers some of the text, your template goes out the window.
Traditional OCR also makes letter-by-letter mistakes, especially in an unfamiliar setting. That means if an “S” in a poem looks a bit smudged, the OCR might correct it to a dollar sign. But there aren’t many dollar amounts in poetry. Traditional OCR solutions don’t know when they’re failing to capture the correct data — in other words, they don’t operate intelligently. The following image presents output from a traditional OCR provider:
Because text is noisy, you must precisely transcribe each field’s text label so your text rules have a chance to match. The painstaking implementation of more and more text rules becomes an enormous hassle. Based on conversations we’ve had with business owners, we know of companies that, to this day, are maintaining well over 10,000 lines of rules. The practical implications of this include destroyed budgets, stalled timelines, and incorrect data.
Why is traditional OCR the industry standard?
In the early days of mainstream computer usage, you could run no more than one program at a time in DOS. In much of the business world, traditional OCR has kept document capture technology frozen in time. It’s as if they had never updated their DOS past its first stage.
Recent advances in Al have encouraged us to be more daring with document capture. Rather than looking at the currently fixed algorithms and fine-tuning them further, we can take a step back and look at data extraction from a radical new perspective. We need to ask ourselves: why are computers so bad at this, when humans can do this really well?
Due to developments over the last five years, teaching computers to find information the same way humans do is no longer a crazy notion. Instead, this approach has become a proven strategy that uses neural networks, deep learning, and big datasets to automate routine tasks.
Self-learning: Humans don’t need templates
Traditional OCR software takes a completely sequential approach to reading a page. For languages with left-to-right scripts, OCR capture starts at the top left corner, and goes line by line down to the bottom right corner. So, isn’t this how humans read an article?
Not exactly. We humans can also look at a page of text and instantly understand what it is saying simply by skimming, and picking up specific pieces of information.
At first glance, we know the difference between a string of random characters and the opening line of Moby Dick. OCR is incapable of making this distinction — it could very well understand that line as “Call me lshm4€I.”
When looking at business documents, the importance of self-learning becomes even clearer. This is because we typically do not read business documents from beginning to end. Instead, we look for specific bits of information, skimming over the page, darting our eyes back and forth looking for key points. We don’t need to read every single character in the document precisely. Take a moment to see what we mean for yourself:
Traditional OCR software does not skim and search through business documents the way we do. It reads them as if they’re books. When matching rules to capture data, OCR requires precise letter-by-letter reading. In traditional OCR, “TOta 1 amourt” will not match a rule that looks for the text “Total amount.” But humans don’t care about such precision. We just know whether or not we’ve found the value we’re looking for, without the need for everything to be perfectly shaped and formatted. We record that value, and move on without a second thought.
Because humans are capable of self-learning, we develop intuition — the unconscious knowledge we gain through experience. We can easily understand an entire document based on the fragments we skim through, then read it again to find any other relevant information it may contain to amend our initial understanding. We can also derive the meanings of certain words or phrases from context. No traditional OCR machine can do this, or anything like it.
When you see a new business document format for the first time, humans can quickly analyze and digest the most important information without any additional input, even if the document is in a language you aren’t familiar with. We, humans, are good at teaching ourselves how to read new formats — it’s pretty impressive.
Data capture solutions: Manual vs. OCR
Let’s go back in time, to the origins of document capture. There was a time when all document processing was a completely manual task. Most, if not all, companies had data entry clerks entering and updating information from paper documents into various business systems. Clearly, this approach is slow, error-prone, and not scalable.
The arrival of OCR solutions held the promise of efficient, accurate, and scalable data capture with minimal human input. However, to deliver on that promise, the software requires templates and rules for every document it processes. It comes with high implementation and maintenance costs, making this technology potentially more expensive than manual data entry. So, lots of businesses still go the manual route.
Manual data capture
To this day, many organizations still use manual data capture, despite its high cost, low accuracy, and inefficiency. In fact, 90% of all invoices in the world are processed manually₂.
Traditional OCR data capture
It might seem hard to believe that in the 2020s, data entry staff still spend hours typing, checking, and retyping information from documents into various business systems. That just goes to show how much of a hassle traditional OCR technology can be. OCR technology delivers on its promises when processing documents with low variability, but less structured documents can quickly become a pain.
As your company probably processes a variety of documents, you’re going to have difficulty scaling your data capture operations with a traditional OCR solution. The creation of new templates and rules for each new document is an expensive and time-consuming process. Your developer will need to spend several hours setting up each new template, then your operator will spend another few minutes processing a one-page document with that template.
Cognitive data capture
Let’s talk about cognitive data capture. Al-based cognitive data capture solutions offer unprecedented speed, accuracy, and cost-effectiveness. Like a human, cognitive technology can process semi-structured and unstructured data, and build intuition over time. Unlike a human, Al can’t be distracted and doesn’t require breaks. Cognitive data extraction can achieve up to 98% accuracy, and it can process data six times faster than manual methods.
Here’s how it works.
The process is simple: first, you send documents to the system by email, robot, or API. Then, the system’s Al processes the documents. Finally, a human operator validates the output, then exports it to the appropriate business systems (e.g. a spreadsheet, accounting software, or ERP platform).
The future lies in AI-powered data capture
The only way to improve data capture is for us techies to take an active role. With an increasing volume of documents, the cost of human operators is on the rise — fewer people are willing to do manual document processing. In addition, document security has increasingly become a concern in the internet age.
To proactively address all of these issues, AI-powered data capture offers an accurate, fast, and cost-effective solution. Cognitive data capture includes features such as:
- Rapid deployment: you can deploy a cognitive data capture solution for any type of document, within a few days.
- Effort reduction: six times faster than manual data entry, a cognitive platform ensures 98% reduction in keystrokes when extracting data.
- Continuous improvement: a cognitive data capture solution automatically learns from each document it processes, eliminating the need for new templates.
- Extensibility: you can import documents via email, RPA, or API and export extracted data to your business systems. Cognitive data capture can adapt to any business environment.
Compared to traditional template-based OCR, cognitive data capture helps your bottom line by improving time management, lowering costs, and streamlining your business operations.
Better data capture for better businesses
Document data capture may seem like a minor process in the grand scheme of your enterprise’s operations. However, as we have seen, the accuracy and speed of data capture can have a huge impact on your business.
Fast and precise data extraction can streamline critical functions and simplify complex workflows. An Al-powered automated solution that features an intuitive UI can minimize human intervention, so you can use your employees to handle higher-value tasks.
Before we go: A few words about table data capture
Tabular data capture has always posed a challenge for traditional OCR users. When determining which fields to capture from documents, you must first sort the fields you want to capture into two groups: fields that should be captured independently (header fields) and fields that should be captured as tabular data (line item fields).
A number of data extraction solutions nominally support tabular data. However, they all fall short of delivering accurate results. In practice, these solutions often have to settle for partial or simplified approaches, or, as is most often the case, disregard line items entirely.
How Rossum turned the tables on tables
In early 2019, we released a new version of Rossum that captures line item data when processing documents. This resulted in a platform that delivers peerless accuracy through its “Magic Grid” approach to human-computer collaboration.
At Rossum, we approach tabular data extraction from two angles — the integration available to implement a table data capture process, and the technology required for automatic table data capture. Our Al research team has made great progress in automating table data capture, which has given Rossum the ability to pre-capture a large portion of tables automatically.
It’s important to note that, while Rossum can read tables, a final pair of human eyes is always recommended to validate the captured data. Any AI platform should augment your role as a human operator in the automated data capture process, helping you to stay in control of the data capture process.
Rossum’s Magic Grid feature lets you capture tabular data in our validation interface. You can see exactly how the platform has automatically drawn cell boundaries, then tweak those boundaries swiftly. Your corrections help Rossum learn how to extract information from tables accurately every time. Simply click to correct minor mistakes, such as split columns, missed rows, or column headers that don’t match their business meanings.
Rossum rarely misidentifies tables, but if it does, you can discard its attempt and instantly draw your own Magic Grid. Once you’ve finalized your Magic Grid, one click transcribes every cell it contains, saving you countless keystrokes. Then, you can match captured data against your item database to highlight unexpected items, validate the accuracy of calculations contained within table cells, and, if necessary, make
manual adjustments to captured tabular data.
Reliable tabular data capture comes with unique challenges and complexity that previously hindered the development of an automated solution. So, what exactly makes extracting data from tables so difficult?
Previous research focused solely on the detection of tables in articles, where tables often have clear borders and column structures, and, most importantly, are surrounded by free-flowing text. By contrast, business documents structure data outside and inside of tables in similar ways. Therefore, tabular data extraction requires Al models that are capable of learning highly abstract knowledge.
Initially, we focused on table detection. This is the process of identifying whether and where tables that are often borderless and have vague structures occur on a page is vital to tabular data capture.
Following this, we focused on cell detection. In essence, this is the ability to split a table into rows, columns, and, eventually, cells. This is where the real magic happens — we released a major update that made a huge difference in cell detection usability in 2019.
This step proved to be our biggest challenge. To surmount obstacles in cell detection, Rossum uses neural networks to establish a unique way of reading documents. It’s based on a skimming OCR approach, which leads to a highly effective solution.
At Rossum, we also developed the ability to identify column types. Once our platform extracts the data structure, it needs to assign meaning to each cell so you can use tabular data in business processes. Rossum needed to be able to understand whether a cell contained, for instance, a description, SKU code, or unit price.
This was the final piece of the puzzle, which we put in place in our March 2019 release. Instead of explicitly treating column content like text (and deciding what the text meant based on the column headers), Rossum looks at the table structure visually. Our models still take the header texts into account — but they also focus on the general column appearance and position in the table, allowing the platform to get the right “gut feeling” to resolve ambiguities.
This synergy between Al and UI provides a complete end-to-end solution for tabular data capture for grid-format tables. We’re currently exploring new frontiers to discover and develop new Magic Grid capabilities, such as data extraction from more complex “non-grid” format tables.