From Spreadsheets to the Cloud (2): Alternatives to Manual Invoice Data Extraction

In our previous post in this series, we looked at manual data invoice extraction and the problems it can create for your Accounts Payable (AP) team and your company. Understanding your invoice processing system’s flaws and weaknesses will help you repair and strengthen it efficiently and cost-effectively.

With the exception of outsourcing, the main differentiator among data capture alternatives lies in the technology that powers them. Procure-to-pay (P2P) portals, electronic data interchange (EDI), and automated invoice data extraction solutions all have strengths and weaknesses that are conditioned by company requirements and limitations. 

For example, let’s take a quick look at what the two optical character recognition (OCR) variants of automated data extraction have to offer. A template-based OCR solution is a reliable cost-effective option for a company that has long-term contracts with a group of vendors that is small enough to be trained to use the company’s invoice templates. On the other hand, a company that does business with a large set of vendors, each with its own invoice format, would be wise to use AI-powered OCR software.

In this article, you’ll learn about alternatives to manual invoice data extraction. This should help you choose the option that meets your company’s specific needs.



Outsourcing the AP data entry process may sound like another alternative, and a reliable provider can work wonders for your bottom line. Typically, however, this option is fraught with perils. You’re allowing a third party to control your invoice processing procedure, and quality issues may take a long time to iron out during implementation. 

Outsourcing also 

  • puts data security at risk
  • adds hidden costs to invoice processing
  • lacks capacity for learning and adapting to your business processes
  • is incapable of innovating AP workflows
  • requires AP resources for cross-checking work 

On top of all that, while the extraction process may cost less per invoice, it’s still manual. So outsourcing actually creates new liabilities that can replace or escalate existing ones.  

At best, a good outsourcing provider that involves you in their data capture processes will continuously require you to provide a lot of input concerning your accounting standards and organizational structure. In a worst-case scenario, a bad provider’s unskilled, poorly trained workforce will deliver unacceptable data extraction accuracy that creates extra work for your accounting experts.

P2P portal

P2P procurement cycle

In an AP context, P2P stands for the procurement lifecycle, starting with requisitioning a vendor’s goods or services and ending with payment for those goods or services. In other words, “from procurement to payment” which has been shortened to “procure-to-pay”.

A P2P portal handles the three key processes that make up the procurement lifecycle: requisitioning, purchasing, and payment, enabling you to bring each department’s activities together digitally into a more structured flow. These activities include:

  • Requisitioning products and services
  • Raising purchase orders (PO)
  • Receiving products and services
  • Invoice processing
  • Payment

This system is designed to centralize Purchasing and AP data, improving transparency while giving you more control over transactions. A P2P portal presents a two-fold alternative to manual invoice data entry. First, all payable invoices must be associated with a purchase order already tracked in your P2P system. Second, your portal allows the suppliers to enter invoice data directly to the buyer’s electronic system, relieving your accounting team of the manual data transfer element.

According to a 2017 global Tungsten Network study, the main causes of P2P friction are:

  • High proportion of paper invoices received
  • Too many non-PO based invoices
  • High volume of supplier inquiries regarding invoice or payment status
  • Lack of automated exceptions
  • Lack of automated approval

If left unchecked, P2P issues can waste a whopping 125 hours per week. And that’s just the minimum: the bigger the business, the greater the friction. Making matters worse, P2P systems also often lack ownership and data governance and make little effort to engage and onboard suppliers according to an HICX Solutions report. So if you’re considering a P2P solution, make sure you factor in the resources, time, and budget required to overcome these issues.


paperless interexchange

EDI enables a standardized paperless interchange of information between computers, all but eliminating the manual work required when processing data from documents received via email, post, or fax. Invoices are among the most common documents that businesses exchange using this solution. Entirely digital and often automated, many companies are using EDI to transfer data more efficiently and affordably. 

The market’s general opinion  is that universal EDI is critically flawed and is incapable of reducing your invoice processing costs – in fact it, moving all your suppliers to EDI can drive up these costs. 

Even before you start using it, you’re looking at a time-consuming and costly setup that places demands on your IT department. The technological complexity of EDI requires you to invest in a network dedicated solely to transferring invoice data between you and your suppliers. Even if you decide to outsource the setup and network requirements, you still end up paying high fees for these services.You also need to invest in training so your AP team knows how to work with your EDI solution.

For the reasons outlined above, suppliers rarely use EDI software. With ongoing standardization and emerging government mandates across the world for EDI, the situation may improve, but this will not be a fast process given the extensive market inertia; therefore, relying on EDI to universally solve your AP issues will not work. Still, EDI is a continuum – it may mean universal exchange formats, but perhaps just extending a P2P supplier portal with an API will do the job as well. Considering EDI for specialized cases (e.g. tight and unique supplier relationships) may work.

Still, your expensive new data extraction solution won’t be able to process paper invoices at all,  increasing the potential for bottlenecks in AP processes because many suppliers will continue to use paper invoices for the foreseeable future.

Automated AP invoice data extraction solutions

AP invoice data extraction

There are two types of automated AP invoice data extraction solutions, both using OCR technology. The older option uses templates and is still worth considering if you only process a few invoice formats. AI-powered OCR alternatives, on the other hand, can deliver substantial savings if you need to handle a wide range of invoice formats on a regular basis. The advantages and disadvantages of both solutions are more complex when you start looking at cost and on-site versus cloud-based platforms. We will examine these more closely in an upcoming article.

Generally speaking, the benefits you gain from an automated AP data capture solution include:

  • Increased productivity — OCR-based invoice processing enables your AP team to focus on tasks other than entering and checking data.
  • A reduction in errors — To reinforce built-in error checking functions, automated data capture gives your AP staff more time to verify that processed information is error-free.
  • Integration — You can get state-of-the-art automated AP data extraction solutions that integrate seamlessly into your existing business software suite without requiring expensive, lengthy, resource-intensive implementation projects.

Template-based OCR solutions

OCR technology converts characters in physical documents into digital text. For invoice data extraction with template-based OCR software, your AP staff needs to configure the program so it knows what information to look for in invoices and where it needs to look. This becomes problematic if you work with several vendors, each with their own invoice layout.

To a certain extent, you’re automating the creation of digital invoices from physical documents, which should recover the man-hours and cost that manual data processing demands. However, there’s a catch – template-based OCR platforms deliver 98 to 99 percent accuracy. That sounds great, but your AP processes need a solution that is 100 percent accurate. Your automated data capture software must place the right data in the right fields for every document. 

Let’s say you have a template-based OCR solution and it’s processing an invoice with 1,000 characters. Your software’s 99 percent accuracy gives you 990 correct characters. Those 10 incorrect characters could cause minor inconveniences if they’re in, for example, your supplier’s name or address. However, inaccuracies in key data such as the PO or invoice numbers, itemized or total prices, or quantities of purchased goods could snowball into costly complications.

Even 100 percent accuracy can’t save your AP staff from manual processing when they receive invoices that don’t match any of your OCR software’s templates. So outliers and exceptions end up creating chores that an AI-powered solution could handle quickly and affordably.

AI-powered OCR solutions

Advances in artificial intelligence (AI) and machine learning (ML) enable automated invoice processing solutions to not only extract data, but also understand data. These cognitive data capture solutions come preconfigured with the ability to extract information accurately from millions of invoice templates, and they continuously learn how to interpret different invoice formats over time.

Fast, cost-effective setup and minimal training also make AI-powered OCR data extraction a viable option. Because it gets to know templates and extraction rules with usage, this variant of automated invoice processing requires less manual work from your AP team over time.

AI and ML have taken invoice data extraction software to new levels of efficiency and accuracy. The combination of these smart technologies with OCR substantially reduces the need for AP staff to check and recheck captured data. The software learns to read various invoice structures so it can distinguish, for instance, a price from a postal code. While you’ll still need human validation of some of the data, the cost of an automated cognitive data capture solution is considerably lower than that of  template-based OCR.


The first three alternatives to manual invoice data processing – outsourcing, a P2P portal, and EDI – all have their own success stories. However, their drawbacks can make them expensive and difficult to manage, making them unsuitable for many situations.

As we mentioned in the introduction, a template-based form of automated data capture could serve you well if you work with a relatively small number of vendors with a limited variety of invoice formats. If you can get your vendors to use your preferred invoice format, EDI might be right for you.

The more likely scenario is that your AP team is handling many different invoice formats and layouts. In this case, you need an automated data extraction solution that uses smart technologies to learn how to read new invoices and accurately capture key information in any format, such as value pairs and tables.

In the next, and final, chapter of our exclusive series, you’ll get insights into how to choose the best invoice data capture solution for your business.

Ready to get started?

Make a quantum leap in your document processing approach. Boost accuracy and effectiveness with an AI-powered data capture solution for all documents.