What is cognitive data capture?

Cognitive data capture mimics how a human mind reads structured documents using AI. This has two major points of impact: it learns to recognize information from examples rather than requiring expert configuration, and it can recognize a lot of information even in layouts of documents not seen by the system before.

Contrary to manual data entry or traditional OCR, cognitive data capture does not require a sizable workforce and the setup of endless rules or templates (detailed comparisons of effort are covered in our TCO analysis series).

Rossum’s cognitive data capture AI uses deep neural networks to recognize patterns in documents, and that is how the technology infers the underlying general structure of business documents like invoices in a similar way a human mind does. Moreover, Rossum’s unique neural network architecture ensures the high accuracy you may observe and allows Rossum to adapt to all kinds of layouts.

Refer to our founders’ blog series on cognitive data capture for an in-depth look at what is the technical difference between legacy OCR and cognitive data capture, and exactly how does Rossum’s technology work.

Can Rossum be implemented on-premise?

To ensure up-to-date and widely scalable security, maintenance and regular updates, we do not offer on-premise solutions. This is because it is the most reliable medium to deliver a widely scalable service with a high-security level. The security advantages of a cloud solution are comparable to those of an on-premise solution. 

Where are your servers located?

We run on AWS out of Ireland. Enterprise customers have the possibility to have Rossum deployed on AWS cloud data storage in a different country.

What languages Rossum recognizes?

We officially support English, French, German, Czech, and Slovak. Rossum extracts from other languages as well, although in some cases it might have lower accuracy. However, this will be improving with the number of extracted invoices in a given language.

What is the price for an annual subscription?

We would need to understand a bit more about your needs before being able to give a pricing estimate. Above the estimated annual document volume, we would need to understand what data fields you need to extract, any customization/training that is needed, etc. If you are interested in a pricing estimate, you can fill out this form and one of our experts will get back to you.

How can I integrate Rossum into my ERP or document management system?

As our main scope is on data extraction, we do not provide integration services directly. We have extensive API documentation, that will allow a smooth integration with the majority of the systems. You can read more on the API here. An example guide on integration alongside the manual integration can be found in this blog post. Also, we have an example of UiPath integration here. For smaller edits that you would need for your integration to be working properly, such as adjusting the format of the output file, a custom connector can be developed for you.

What happens to format issues for fields with typical differences among countries and languages such as the date (DMY, YMD and MDY) or decimal and thousands separators, for instance?

The data capture engine can deal with any format and normalize it to the user-preferred standardized representation. To deal with the remaining genuinely ambiguous cases, we have a special “locale” setting which can be set individually for each document queue and specifies the prevailing expected region of origin. Date formats are very flexible, we can customize your UI to the format that you wish the exported data to be in. You can find some of the supported formats here, under the “Date format” section. The tokens mentioned there can be found at this link.

Does Rossum follow GDPR?

We are fully committed to ensure compliance with GDPR. We are processing customer-provided documents for the primary purpose of data capture, based on the instructions of our customers, and for the secondary purpose of further research and development of data extraction technology. Based on the nature of the data and the GDPR balance test, we have full reason to believe that this processing is fully compliant with GDPR, particularly when not concerning third-party consumer invoices or invoices with sensitive personal data. You can read more about this in our terms

How does Rossum secure code integrity?

We follow the OWASP Secure Coding Practices and otherwise rely on the extensive experience of our senior team members. In the event of a code change, we perform design reviews, code reviews, and security reviews. Every commit is inspected and reviewed by at least one other software engineer. We use thorough automated testing including unit tests and integration tests, as well as manual testing to ensure code quality and security.We also use automated third-party tools for static source code checks and vulnerability scanning. Our platform also undergoes regular penetration testing by an independent third party.

How is the data encrypted?

We always use encryption for data transfer in and out of the cluster. We employ AES 256 keys managed in AWS Key Management Service for data at rest and strictly use TLS v1.2 for all data in transit using HTTPS (including HSTS). All outside communication is strictly encrypted in motion (typically via HTTPS for regular production operation; via SSH encryption for some service and maintenance purposes). Communication with the database is always encrypted and we use an audit log for all operations that happen in the application.

What other document types does Rossum support?

Apart from invoices, we can extract data from other semi-structured documents including receipts, purchase orders, shipping documents etc. It is also possible to train a custom AI model for a specific type of document. Rossum is applicable to non-invoice documents and our custom training addon works great to capture data from all documents where layout plays a role. A prime example of such a document is an invoice, a bill or a receipt, but similar-looking documents like orders, delivery notes, confirmations, statements or even forms are also good fits. Two restrictions apply – the must have Latin characters and if tables occur, they should follow a grid format and their columns must have uniform meaning. 

Automate data extraction from your documents with Artificial Intelligence.
Free trial