Under the Hood: Rossum Improvement News
We have compiled a list of the most important updates to the Data Extraction API (DE) and the Document Management API (DM). The Data Extraction API is the core AI engine, taking care of the automated data capture process. The Document Management API is the workflow system maintaining document queues, callbacks, the verification user interface, web app etc., recommended as the main interface for all new users. We also list updates to the
You can see a comprehensive list of the
Try it out for yourself. Sign up for a free trial.
This month, we have brought two much-requested features to Rossum: receipt capture and statistics. Receipts can be tricky documents to capture because of their inherently complicated nature. Statistics will help in analyzing your processes in data capture and optimizing them. We plan on further developing these features in the coming months.
For invoice and receipt capture on the go, we have created an Android app.
Now, you can download an Excel spreadsheet summarizing basic Rossum usage statistics including document counts (imported, exported) and time spent for each user (and per queue). We have also documented how to pass document annotation data updates, e.g. for custom engine training purposes.
Do you also need to work with receipts? We have dramatically improved the accuracy of receipt capture with a more comfortable and flawless user experience. We have also reduced the processing latency of submitted documents at periods of peak traffic.
Receipt capture is on our list of improvements that we’ve been working on because we know you have been asking for it. We started with a new version of our OCR and continue to build a full-blown receipt capture tool. From April onwards, you can rotate documents, so upside down invoices can now be processed quickly in Rossum.
We now offer basic support for receipt capture: a new version of “skimming OCR” that’s accurate on receipt fonts and environments. We also extensively trained our AI on receipt samples, significantly increasing localization accuracy.
Have you uploaded invoices upside down? We have introduced document rotation support, with a view control widget that lets you control the orientation of the document. Rotated documents are automatically reprocessed by the AI engine, saving you time.
Minor update to the connector.
We will automatically identify and extract new information payment states such as paid, unpaid etc. Improved OCR speed and accuracy means that clicking on a page to read data is now 2x faster. We also got rid of extra characters appearing in some table cells. Tax details with a 0% rate were cleaned up and discarded in a few cases where they shouldn’t have been extracted.
Minor update to the Document Management API, such as improved Magic Grid behavior, browser compatibility, improvements to verification view and user interface language.
Update to the Rossum Document Management API, including a significant speedup of on-demand text extraction.
Registering a trial account in the Rossum DM is now openly available, with three default schemas to choose from (US, UK & EU). The user interface should feel noticeably faster. We also implemented a range of visual and stability improvements – a document page filling a much larger part of the screen for an improved user experience.
We have released v2.0 of the elisctl tool with various usability improvements such as experimental support for editing schemas (sidebar description) using Excel (xlsx) files. Customizing Rossum features is much easier.
Rossum Data Extraction API update fixed a table extraction bug that produced empty or half empty cells. We also improved reading of noisy images including outside tables.
Rossum Document Management API features a new Download button in the Exported tab in the web dashboard. You can now download all captured data in csv, xml, or json format. The Document Management API respects filters selected in the Exported tab (particularly the search string).
Major update to the Rossum Document Management API: the Magic Grid tool for rapid line item data capture. It is now available via a new button within the line items multivalue section. See our blog post for more details and a video demo.
Features update to the Rossum Data Extraction API that introduces 2 new properties of table cells – value and value_type, which are straight analogies to the header fields’ properties of the same name. In the Rossum Document Management API, this improves automatic table data capture quality especially on the amount columns. We now process digital PDF documents that do not require OCR slightly faster. We improved the accuracy of document property classifiers (document type, currency and language).
An update to the Rossum Document Management API user interface. We have differentiated review behavior by clicking the “Start processing” button vs. opening a specific invoice.
In batch review, exporting an invoice brings up the next invoice in the queue that is available for review. When pulling up a specific invoice, an “annotation stack” is available where you can browse the invoices back and forth. Therefore, for regular operation, the “Start processing” button should be used, whereas opening a specific invoice is meant for inspecting the queue rather than processing its entirety. Individual documents may now be opened in new tabs again.
Major update to the Rossum Document Management API with line item automation support. Line items may now be pre-captured which enables line item automation for the very first time! There is now an API endpoint for creating new organizations and queue export supports the same full set of filters as the queue annotation list. This means exported captured data may be filtered by a wider variety of timestamps or even just an explicit list of document ids.
Bugfix update to the Rossum Data Extraction API. Fixed the API behavior in case of some unprocessable documents when those documents permanently hung in the “processing” state from the API perspective. This caused some documents to be stuck as “importing” in the Rossum Document Management API, which is also fixed by this update.
Update to the Rossum Data Extraction API, introduces a feature: determination of table column types. You now get information, not just about how the table is split into cells (which rows are header or data) but also what each column means semantically: is it a quantity, is it a description? We also compiled a brand new table extraction format documentation.
We improved accuracy on low-quality scans and VAT details parsed (an unusually written VAT rate such as “20,000” is now interpreted as 20%).
An update to Rossum Data Extraction API and the Document Management API. Our core AI engine models have been updated to more accurate versions. We have improved attachment extraction of our email gateway to cover all kinds of forwarded, signed or deeply nested mail messages.
Update to Rossum Document Management API handles queue export API, fixes xml export and improves annotation search with a new `tolerance` parameter.
Rossum Data Extraction API is enhanced with table improvements: table extraction engine is now accurate in terms of both determining tabular areas on a page and splitting rows and cells. We also fixed OCR handling of multi-line table cells.
Data extraction is faster, we have sped up a portion of the field OCR process and expect the processing time per page about 2 seconds faster on average.
About Rossum app
Rossum app is the cognitive invoice data capture tool, powered by Artificial Intelligence, enabling companies to capture data from financial documents efficiently and with human-level accuracy. Unlike existing text mining solutions, Rossum’s unique artificial intelligence technology reflects the way humans read documents. This eliminates the need for costly manual implementation, a game changer in the data capture business.
Rossum is an artificial intelligence company that extracts data from documents with human-level parity helping companies automate their data entry tasks and thus creates significant savings. Our mission is to teach computers to support human creativity, and unshackle the human mind from rows and spreadsheets.