Update on Rossum’s line item extraction from invoices
At Rossum, we have been hard at work researching line item extraction from invoices. It is a daunting task, but we are not afraid. We know you have been waiting patiently to hear from us, so we have put together a brief update of what has been going on in research, as well as some conclusions we have made from the results thus far. There is still more to learn – but we now know we are on the right path!
Usually, for automatic processing, line item extraction from invoices is needed. For starters, it could be connected to a personal database of expenses, serve as an automatic ending step for storage systems, or even simplify the hard work of auditors needing to pair orders and invoices.
In the scientific world, this problem translates to table detection and table extraction or understanding. During a competition called ICDAR 2013, that compared submissions with commercial products, originated table extractions from HTML or pure texts. The goal was to extract a table from a scientific, or any other type of article. Usually, the first algorithms used only heuristics, getting to an 80% success rate only on specific types of tables (eg. tables with lines). From another point of view, there was recently a big boom of object detection algorithms (for example, for AI driven automatic cars), one of the most successful being YOLO.
How did these algorithms fare on our data set, where an invoice is, from a perspective, a schematic layout of tabular structures?
We were not happy with the results of heuristics and even with YOLO, at first. Even on simple tables, the imagined success was not achieved. So we need to dig into more experiments, trying to teach the neural network to paint where the table resides and connect rows and columns (aiming to extract simple tables at the beginning). From first glance, it seems we are on the right track – the algorithm can detect whole tables and even identify individual rows as shown below:
In parallel, we also work on hybrid methods (a combination of the older methods mentioned above and artificial intelligence). Here are some examples of column search, based on custom image-processing features and learned 1D dilated segmentation.
For a simple insight, these images show how we want to decide on the positions of columns from the image features shown in blue (some of them can be thought of as a position histogram of all ink used in the image). The orange line is a learned decision, from which, if we select its peaks, we get the nice column splits as shown on the images of tables.
Universal table extraction from invoices without templates is still an unsolved problem across all data capture software. But from the moment we gave it as the main research focus here at Rossum, we have already seen huge progress towards the goal, some of which we wanted to share above. Right now, we are fitting the last missing pieces of the end-to-end extraction pipeline, as well as experimenting with ways to make our neural networks even more accurate. While our current focus is perfectly extracting information from simpler tables, for longer term work, we have our sights set on complex cases like overlapping columns or nested tables.
To highlight what else has been going on in Rossum at the same time, we have released a guide on the integration of Rossum’s invoice data capture tool into UiPath’s automation flows. We will keep updating the progress of our line items research – don’t you worry!