Leading the Market in Table Data Capture
We released a new version of Rossum app that dramatically overhauls how line items can be captured in all invoice processes. Rossum now provides a table data capture solution that is unique in its accuracy and its “magic grid” approach to a human-computer collaboration on data capture.
When thinking about which fields to capture from documents, the first order of business is to sort the fields you want to capture into two groups: those that should be captured independently (“header fields”), and those that are captured as tabular data (“line item fields”). This is because, traditionally, capturing tabular data represents a much bigger technical challenge. Many data capture products nominally support tabular data, but achieving high accuracy in reality is an entirely different story. In practice, solutions often have to compromise on just partial or simplified approaches – or most often, give up on line items entirely.
The Big Deal Is:
Yesterday, we released a new version of Rossum app that dramatically overhauls how line items can be captured in all invoice processes. Adding support for tabular data has been our major focus in Rossum for more than 6 months now. Our efforts finally culminated with a rapid sequence of releases during the last month.
The outcome? A solution that is unique in its accuracy and its “magic grid” approach to a human-computer collaboration on data capture.
How We Turned the Table… on Tables:
We approached this problem by thinking about table data capture from two perspectives – the integration available to implement a table data capture process, and the technology required for automatic table data capture. Our AI research team made a series of breakthroughs in the required automation technology in the past month, which enabled a large portion of tables to be pre-captured automatically. But before diving into the deep tech, let’s talk about the impact of this revolutionary technology.
The way we think about reinventing the data capture process at Rossum is always about both technological disruption and process disruption. It’s great to deliver breakthrough technology, but we always care about how to make it work in a real data capture process first.
We have to keep in mind that even if Rossum can read tables – we also still have to offer validation and error correction for them. That’s a critical part of the process, and you can’t go ahead without it. As with the header fields, we seek to augment human operators with the automation we provide – dramatically speed them up, but keep them in control especially for the ambiguous cases.
This is why we invented magic grid. The way to capture table data in Rossum within our verification interface. It now shows the user exactly how the automatic engine has drawn the cell boundaries, but also allows them to quickly tweak this. The way the human inputs the corrections is critical to helping Rossum learn how to handle the tables correctly next time.
Is a column mistakenly split in two? Click. Did Rossum miss a row? Click. This is also the time to check that each column is matched with its business meaning. In a fraction of cases, Rossum will misidentify something – then the user can just scrap its attempt completely and redraw the grid themselves. But we have made sure this process is quick and easy, and the human effort that goes into it then has a lasting value. It doesn’t have to be repeated over and over.
But the main time saver is what comes after – when the grid is final, a single click of a button completely transcribes the whole table, all cells in a single sweep. Can you imagine all the keystrokes saved in this single moment?
Finally, all the power of Rossum extensions is still at your fingertips. You can immediately match the SKUs against your item database to highlight unexpected items, make sure the total price fits the rate and quantity of each item, or assign these GL codes line by line.
Reliably capturing tables in the wild comes with unique challenges and complexity that have prevented any widespread automatic solution before. So it’s worth a quick dive into the technology to understand what’s so hard about it, and how we solved this challenge.
Initially, we just focused on table detection – identifying whether and where tables (often border-less and somewhat fuzzy structures) occur on a page is far from trivial by itself.
Most scientific research so far has focused only on the detection of tables in articles, where tables often have explicit borders, well-behaved column structure and most importantly are surrounded by free-flowing text. In contrast, the structure of business documents outside and inside a table is very similar (see the example), requiring AI models capable of highly abstract knowledge to tackle this.
The next step for us was cell detection – the ability to split a table to rows, columns and eventually cells. This is where the real magic happens, and we released a major update that really makes the difference in usability at the end of January.
This step continues to be the biggest challenge, but we are really proud of the level we already reached. What enabled us to crack this nut open is our unique way of looking at a document page in our neural networks, based on the skimming OCR approach – learn more in our founder blog post series.
Finally, column type identification is a completely new capability we released just last week.
Until now, it was all about extracting the structure of the data, but in order to use it in a business process, meaning needs to be assigned to each cell – is it a description, SKU code, unit price, total amount? That is what this final step does, completing the last piece of the puzzle.
To get there we tried multiple approaches, but the trick that did it was one we’ve used before to great effect. Instead of explicitly treating the column content as text and deciding, based on the column headers, what the text means, we look at the table structure visually. Our models still take the header texts into account, but also focus on the general column appearance and position in the table, allowing them to get the right “gut feeling” to resolve the ambiguous situations.