Rossum CTO on AI digital transformation - Is it just hype?

Is AI digital transformation just hype? How much of it is real? Rossum CTO and AI expert Petr Baudis has the answers in this interview.

with-image-copy-2_1

What is digital transformation?

What is digital transformation, and what does it mean to you and your company? You can spend hours or even days falling down a multi-tunneled rabbit hole of various definitions and guides. Save time and learn in this insightful article.

Optical Character Recognition software

For many years, organizations have been dealing with the inefficiencies and high costs of manual data capture. The data itself is essential for a variety of business processes. Once captured and converted into a structured format, it can then be used in a variety of systems and applications. These can automate various business processes and streamline teams’ ability to achieve goals. So many business processes are document-based. For example, invoices, purchase orders, packing lists, and claims are all examples of transactional documents that need to be managed in order for businesses to be able to operate. 

One of the key elements of document management is efficient data capture. However, it is impossible to be efficient when capturing data manually. Manual data entry relies on humans reading through each document and then manually typing data from the document into an application that structures the data. 

Even the best and most experienced data entry employees can still be distracted, make mistakes, and become demotivated. The reality of data entry is that it is a tedious task that can demotivate your team and leave them dissatisfied. At Rossum, we have worked with clients whose employees were on the verge of quitting due to the pain of manually managing documents like invoices. 

Fortunately, optical character recognition (OCR) software has been developed as a way to solve this problem. There are, generally, two main kinds of OCR software – template and cognitive. Template OCR is the more traditional form of character recognition, but cognitive OCR has many advantages over template-based recognition. 

Consider a basic optical character recognition example use case — a paper-based invoice has just come into the accounts payable department and needs to be digitized in order for the accounting system to receive the data and process the payment. Instead of a human reading the document, OCR software can scan the document and automatically extract the data and export it into a structured format (i.e., an Excel table or text file). 

The best OCR software enables you to process hundreds of documents in a short period of time, increasing your efficiency and alleviating the pressure on your team members. This means that you and your employees can have more time to focus on more strategic initiatives that can grow your business. 

Optical Character Recognition algorithm

Template-based OCR is the traditional optical character recognition algorithm that many organizations have implemented. It is the first main step toward automated data capture. Basically, the system is given rules and templates that it uses to identify and capture data when it scans a document. This process can be very accurate and useful for certain business applications. However, a new template and set of rules must be created for every variation of document you want to process. For example, you may think that all invoices look alike, but even small changes to the locations of crucial fields can completely baffle a template-based system. Even variable fonts or colors can cause inaccuracies and inconsistencies in the data captured by a template-based system. 

All of this means that businesses that implement template-based OCR have to spend many valuable hours creating a huge number of different templates for every possible variation. This puts a cap on how much automation and efficiency template-based OCR actually provides. A cognitive OCR algorithm removes this cap. Unlike template-based OCR, cognitive OCR relies on machine learning technology to learn various document formats automatically. 

The more documents you process with the system, the more it will learn about the documents and how to extract the data in the fastest and most accurate way possible. Plus, the best cognitive OCR solutions come out of the box, trained in thousands of different formats and document types. Rossum, for example, is a comprehensive document processing solution that features a cognitive OCR solution as its core engine. This engine comes trained to process a large number of document variations so that you can start capturing accurate data on day one.

Optical Character Recognition Machine learning

What is meant by optical character recognition machine learning? Machine learning is a subset of artificial intelligence. It is an innovation that is powered by a technology known as neural networks. Neural networks are computer systems designed to mimic the structure and functionality of the human brain. This architecture gives computers the ability to “learn” things through a process called machine learning. Practically, this means that an OCR solution with machine learning has the capability to “read” documents more like humans would

When human beings read a document, we are able to deduce its purpose by skimming through it quickly. Once we’ve categorized a document (as “invoice,” for example), we then focus on the points where we expect important information to be stored. Rossum’s OCR solution was specifically designed to follow this pattern of behavior. By initially “skimming” the document, the system creates a map of where values are located. 

Rossum then uses its neural networks to analyze this spatial map and conclude what kind of data is likely to be located in each field. Once it has determined where to find each element of data, it then goes character by character, capturing each field’s data and content. The end result is rapid, highly accurate, automated data capture. 

The benefits of AI-enabled OCR clearly place it ahead of traditional OCR software. With template-based character recognition, only around 50% of document management tasks can be automated, and maintenance can be very expensive. With cognitive OCR powered by machine learning, up to 98% of tasks can be automated, and maintenance is done automatically by the AI. 

Optical Character Recognition documentation

One way to take a deep dive into optical character recognition and how it works is to read optical character recognition documentation. Various software platforms, including Rossum, provide access to documentation so that you can read more about the practical steps required to set up and actually use these software platforms. If you are really interested in exploring the technical challenges associated with building an AI OCR solution, you might consider building a simple OCR application using the Python programming language. 

An optical character recognition project in Python can actually be complete with just a few lines of code. This is most commonly done using the Tesseract OCR Python Library. Tesseract is an open-source library that can extract data from images. From a practical perspective, a simple program like this would not be able to process documents in a business environment successfully. 

The Tesseract optical character recognition dataset, though powerful, is only trained to recognize certain fonts and has several other limits on its functionality. On the other hand, a project like this can be a great way to learn more about automated text detection in images powered by artificial intelligence. 

Text detection in images using Deep learning

Text detection in images using deep learning is the essential function of cognitive OCR. When a business document first arrives at a department, it is often either in paper or a recently scanned PDF. PDF files are versatile and have been extensively used by professionals to share documents across a variety of systems and applications. However, the data within PDF files is not structured. From a computer’s perspective, a PDF file is merely an image. That’s why it’s vital that an OCR solution can extract data from an image.

As we have already discussed, it is possible to write a fairly simple Python OCR image-to-text application. However, because of the limitations of the OCR engine, accuracy is a challenge. If a document has a font that the engine doesn’t understand, it will most likely make errors. If the image has blurred fields or fields where the data is handwritten, a system like this will also struggle. This is why the best OCR solutions are trained to recognize a wide variety of fonts and text stylings. Rossum, for example, is capable of extracting accurate data from blurry images and handwritten fields.

OCR accuracy

OCR accuracy is crucial. Without accurate data capture, optical character recognition devices are useless. Achieving higher levels of correctness has been one of the long-term goals of developers in this area. Cognitive OCR has a higher level of accuracy than template OCR because of its ability to handle variations in documents. When a template-based OCR solution attempts to scan a variation that it doesn’t recognize, the results will be highly inaccurate. Inaccurate data can then filter into other systems and impact crucial processes. However, accuracy shouldn’t require sacrifices in the area of efficiency and cost. That’s why it is a better option to use a cognitive OCR solution like Rossum to process documents.

It’s important to remember that OCR is merely part of the big picture when it comes to document management. Intelligent Document Processing (IDP) solutions, like Rossum, provide a comprehensive ecosystem of functions (including OCR) that streamline the document processing workflow and enable you to automate entire business processes such as Accounts Payable. With Rossum, you can keep your team motivated, achieve more of your strategic goals, and take control of your data.

Ready to see Rossum in action?

Create your free account and start extracting your document data already today.