Does adding one neuron help real world networks?
We have long been thinking about the behavior of neural network training in the case of non-convex tasks with many local minima. This makes training difficult or inconsistent for many machine learning problems. Also, the theory is lagging behind in practice and not much is guaranteed. A recent theoretical paper aims at solving this problem in a new ingenious way and caught our attention. We decided to do the first experimental test of this theory. Is it practical, and does it work? Read on to learn the outcome!
OCR deep learning
Thousands of lifetimes are spent by employees all over the world manually processing documents. Manual data entry is a soul-crushing job that nobody should have to spend so much of their time on. Furthermore, it is also one of the most inefficient, error-prone, and expensive ways to extract data from documents. Rather than let manual data entry continue to hold the world back, we took matters into our own hands and built an OCR deep learning solution that enables humans and machines to work together like never before. Our artificial intelligence technology was designed to be utilized through the most intuitive UI possible, making it easier than ever for average employees to deploy and use one of the most powerful data extraction systems on the market today.
The main objective of an OCR solution is text detection. Data is almost always stored in business documents in the form of text. Regardless of the various formats, fonts, and other differences, it’s fairly easy for any human to recognize an invoice when they see one. This, in turn, makes it easy for them to find the data they need. In order to automate document management processes, a system is needed with the ability to detect text. This has, historically, been a challenge, but text detection in images using deep learning is possible and we have proved that it works.
The way humans look at a business document is very different from the way most OCR systems do. Most of the time, humans will look at the document as a whole, define its category or genre, and then focus on the key locations containing the data based on where we know that data usually is. Traditional OCR, by contrast, goes straight down to the letter level and goes character-by-character scanning for information that it thinks may be data.
Rossum is an advanced OCR solution that uses deep learning to look at documents the way humans do – holistically. What is deep learning? It is a subset of machine learning and artificial intelligence that focuses on using neural networking technology to mimic the way a biological brain is able to identify and recognize patterns. The next step is text localization. Text localization in deep learning refers to the concept of an artificial intelligence system capable of utilizing neural networks to identify where data is most likely to be in a document.
What is OCR?
There may be some readers who are wondering, “what is OCR?” OCR is short for optical character recognition and refers to technology that streamlines the manual data entry process. Traditional OCR has the capability to read and capture the data within invoices and other documents. However, it also has certain built-in limitations. Almost every vendor will have a slightly different-looking invoice. This means that there is a fairly significant degree of variability when it comes to invoices. The same can be said for many other kinds of documents. Here’s where OCR starts to come up short. For every document with a different layout, a traditional OCR solution requires a custom-generated OCR template in order to process that information. This means that your manual data entry employees will merely go from one tedious task to another. Instead of entering data, they will be spending hours building templates, looking for the right template for the right invoice, and then correcting all the errors.
The best text detection algorithm would use AI to get all the benefits of OCR, with none of these downsides. For example, an OCR software built with deep learning could enable it to read invoices and documents just like a human would. Plus, this kind of OCR wouldn’t require any complicated setup or hours of template building. Rossum is a great example of an out-of-the-box OCR solution that requires less maintenance and is more efficient than traditional OCR. We like to say that there are four main benefits of using an AI OCR solution:
- Saves Time and Cost
- Ensures Fast ROI
- Frees up resources
- Streamlines processes
Ideally, OCR software should provide all these benefits and combine them with an extremely easy-to-use user interface as well as powerful reporting capabilities. This can dramatically reduce the workload of each employee in a variety of departments, allowing them to focus more time on higher-value tasks and initiatives.
How does OCR work?
A cognitive OCR solution may sound like a good idea, but how does OCR work with deep learning? We’ll look at Rossum, as they are one of the only robust deep-learning-powered data extraction solutions on the market today. We have broken down our optical character recognition algorithm into three stages:
- Data Localization
- Precise Reading
In the skim-reading phase, Rossum extracts all text from the document and identifies the spatial location of the information on the page. This generates a map of the content of the document. The goal of this phase is to identify the overall, rough layout of the information within the document. As humans, this is what we normally do when we see a document and instantly categorize it as an invoice or some other document before we’ve even read the data. In the data localization phase, Rossum’s AI engine focuses on the specific areas within the map of the document where it expects there to be data. Over time, our system acquires, through deep learning, a general understanding of what an invoice looks like, and can use that information to find the correct areas of focus. Our solution doesn’t miss a beat. Furthermore, Rossum is specifically designed to prefer identifying spaces but is also trained to carefully reject false positives before capturing data. Finally, in the precise reading phase, Rossum uses the focal points identified in the previous phase to carefully and precisely extract the data. Rossum has the ability to make multiple verifications of the data it is extracting. It compares it to past invoices and then runs the calculations to make sure that the entire document is internally consistent. The end result is rapid, highly accurate data capture that can be applied to one document, or hundreds.
We have already mentioned that there are two main OCR techniques or methodologies when it comes to data capture. The first is traditional, template-based OCR. This enables up to 50% of document processing tasks to be automated. On the other hand, expensive experts are required to be on hand to maintain the system and implementation can be expensive. There will always need to be new rules and templates created. Template-based OCR also suffers from being difficult and time-consuming to use and it’s challenging to train your employees to be able to effectively use these systems. Finally, hours of work per employee are lost in mere management and maintenance. The other technique is to use deep-learning technology in the form of neural networks for OCR data capture. This system allows for accurate text extraction from image using OCR machine learning. These neural networks can recognize the underlying relationships in sets of data the same way the human mind operates. With this form of OCR, up to 98% of tasks can be completely automated. Maintenance is automatically done by the AI engine. This means that it is much faster and easier to use. Plus, a cognitive OCR solution requires virtually zero implementation time or effort. It just works, straight out of the box.
Rossum is not the only OCR option for data capture available. However, we have not been able to find a software solution that provides the same level of functionality and usability anywhere. As yet, we are the only ones who have built a completely setupless data capture solution that is also incredibly easy to use. There isn’t even an industry-wide benchmark to compare other solutions if they were out there. The simple fact is that all the other OCR products we’ve tested simply could not process documents at the scale that Rossum is able to do. In businesses all over the world, billions of invoices are constantly being processed. A truly practical OCR solution needs to be able to capture data at high volumes, meaning large numbers of documents like invoices. If you’re interested in learning more about our unique OCR software, try out the free trial today. In the end, our primary objective is to come alongside human operators and help them as much as possible. We do not want to replace human employees but want to give them the ability to do their tasks faster than ever before. Rossum has the capability to reduce monthly keystrokes per person from around 60,000 to only 4,500, freeing team members to be able to focus on more fulfilling and impactful roles. Plus, our solution is cloud-based which means that you can always easily access the data capture process from any device with a web browser. We have designed our user experience to be welcoming and intuitive, reflecting our focus on employees and helping businesses to be able to serve their customers with more speed and flexibility than ever before.