Last month, UiPath unveiled a new capability into public beta - the Receipt and Invoice AI. We have seen a lot of demand for Cognitive Data Capture solutions from RPA providers specifically, as Accounts Payable is the prime target for process automation. While there are caveats, the new UiPath capability is going to be a welcome stepping stone for some RPA implementations. Let’s take a closer look!
UiPath Machine Learning Extractor
UiPath is working hard on enabling machine learning in its robots. An essential component in this effort is the new “Machine Learning Extractor” activity, and the Receipt and Invoice AI is UiPath’s showcase capability for this activity.
Unlike Rossum’s visual approach to data localization, UiPath’s AI follows the more traditional staged approach of first performing regular image-to-text OCR, then identifying data points based on the read text. Similar to Rossum, the Artificial Intelligence component is cloud-based, though there is no option for customization on user documents yet (the extraction quality is purely take-it-or-leave-it). Overall, the focus of the whole activity is the AI capability, with only a rudimentary manual validation interface - but we will come back to that below.
To get started, the basic setup guide, which includes a sample workflow, is available on the UiPath forum page. After setting up the basic taxonomy for the document type, a nice touch is the option to choose the OCR engine (either the free Tesseract OCR, or commercial engines from Microsoft, Abbyy, Google or Amazon). The processing workflow starts with the initial reading made by the OCR engine. Afterwards, the document is passed further to the Machine Learning Extractor for the data capture itself. The results may be stored in an output file.
The Extractor does a very solid job extracting the data automatically, and importantly, can also branch out from invoices to receipts - the user just needs to keep them sorted separately. Nevertheless, its practical usability is still limited by a couple of factors. First, the Extractor only comes with a rather basic set of fields to be extracted - for an Accounts Payable workflow, you may find some key fields still missing (like Vendor VAT ID or any fields on customer data) and others rather ambiguous (say, whether the unit price is with tax included). The other troublesome point currently involves some rather drastic limitations of the engine - a size limitation of 4 MB and 2 pages per document, together with the limit of 10 documents per minute for a single robot.
Automating Data Capture with UiPath
When we think about the best way to solve the data capture needs of a particular user at Rossum, the key points we consider are the automation challenges (that we can solve by using Artificial Intelligence) and the complete day-to-day workflow (where we look into the quality assurance of capture results, system self-improvement, and integrating business rules and data sources into the process).
We need to open with the admission that we have been quite honestly delighted by the extraction quality of the UiPath Extractor. Even though only English, Romanian and Spanish are officially supported, we have also seen good results on documents in central European languages. Only a narrow set of fields is supported and some of them (like the vendor fields) still have room for improvement. Also, the confidence scores are consistently so low that they just have to be ignored. But the line items description and amounts extractions are especially good and actually on par with Rossum’s out-of-the-box engine in some cases. And for many receipt types (especially US receipts), we would actually currently recommend the UiPath Extractor over a non-customized Rossum engine!
The trickiest issue with the UiPath Extractor that we see is the lack of a proper validation interface for making user corrections to the extracted data. Although a basic UI can be created using the UiPath Studio, it is more of a development tool and we cannot recommend it for day-to-day operation. It is impossible to quickly check the capture results, capture the data manually (by pointing-and-clicking rather than actually retyping it), or apply other operations such as rotating or postpoing documents. This essentially puts the Extractor in the same category as AWS Textract, albeit easier to use - a tool that is most suited to processes that have a chance to be completely automated by just its out-of-the-box AI capabilities.
Since we started our pioneering work on cloud-based Cognitive Data Capture at Rossum, two things stand out as the most important learnings. First is our unique AI approach to automated data extraction. Second is the importance of the human validation step, which is the point of a great convergence in the capture process. The human validation follows the AI extraction, providing quality assurance for the few percent error rate, but also essential feedback that allows the AI extraction to improve and avoid these errors in the future without any expert intervention.
Equally as important, the human validation precedes the data export. Our primary consideration when designing a data capture process for our users has been to truly make them many times faster than before, and an essential aspect is that a typical document is viewed by a human once and only once in the capture process. This is the moment where the human finalizes the data capture, but following this principle it also has to be the moment where all business rules that may need user input are applied. This is why we have built the Rossum Extension Platform, which ties into the validation view with all the user-specific functionality: to connect other data sources (like vendor and PO databases) and to apply business rules (like enforcing data format or logic validation).
Based on our experience, data capture solutions without a comprehensive approach to the validation workflow are, unfortunately, still far from a practical fit for the typical data capture workflow. Tools such as the UiPath Receipt and Invoice AI or AWS Textract may nevertheless be feasible in cases where an extremely high rate of automation is aspired: uniform documents and external data sources so rich that they can serve to autovalidate essentially all of the data. In this category, we are giving UiPath the thumbs up as being more approachable, while AWS offers greater customization due to its generalist approach to extracted fields.
The new UiPath capability can serve as an alternative pathway towards basic data capture experiments, at the very least. This eminently demonstrates the power of cloud-based Cognitive Data Capture, the same model that Rossum champions. A light-weight solution that is powerful, but quick to use and trivial to maintain.
Evaluating solutions that could compete with our own is always tricky. Yet, the core creed of Rossum is the search for truth, and hopefully that helped us to provide a level-headed explanation of the niches that the new UiPath capability currently fills perfectly. Whether you fall among the ideal users of the activity, or you would benefit from a full-fledged solution for high quality data capture like Rossum, a wider palette of tools certainly benefits everyone and we look forward to more of these.
Onwards to creating a world without manual data entry together!