Globally, businesses produce a staggering number of invoices: a 2019 Billentis report on the state of e-invoicing puts the current total at 550 billion invoices per year. In a 2018 survey, Levvel found that 36% of invoices are submitted in paper format, which means 352 billion invoices are received electronically1, primarily in Portable Document Format (PDF). And we can safely assume that most accounts payable (AP) teams are scanning paper invoices so they process them.
The versatility and flexibility of PDF have made it the de facto official file format for document sharing and collaboration. However, converting data from PDF invoices into accounting tools can present a challenge. In this blog post, we’ll focus on how you can export data from PDFs into spreadsheet programs like MS Excel.
A brief history of the PDF
Adobe invented the PDF over 25 years ago. Its purpose is to facilitate the cross-platform exchange and presentation of documents, thereby enabling users to create and share documents with the devices and software of their choice. For example, a vendor can create an invoice in MS Word on a Windows PC, save the invoice as a PDF, and send it to a customer who can read and sign the PDF on a Mac. The customer can even edit the invoice and send it back to the vendor for additional changes if necessary.
“What industries badly need is a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks. These documents should be viewable on any display and should be printable on any modern printers. If this problem can be solved, then the fundamental way people work will change.”
— John E. Warnock, Adobe Cofounder
PDF has become so ingrained in our lives that we now take the format for granted. However, It was not an overnight sensation, as there were some significant barriers to mainstream adoption, including:
- Adobe’s PDF creation and reading software, Adobe Acrobat, cost $50 per user (equivalent in purchasing power to $89 in 2019)
- Early versions did not support external hyperlinks
- Because they were much larger than plain text files, PDFs took an excruciatingly long time to download through the slow modems of that era
- The puny processing power of 90s computers rendered PDFs at a snail’s pace
Advances in technology and free distribution of Adobe Reader (now Acrobat Reader) helped make PDF the current standard for electronic documents.
You can read more about the surprisingly fascinating history of PDF elsewhere. Now we’re going to take a look at a common use case for this document format - invoices. You’ll also see that extracting data from PDF invoices is actually not that difficult once you have the right software for the job.
The problem with PDF invoices
Without the right tools and processes, PDF invoices can create accounts payable (AP) inefficiencies and increase the total cost of ownership (TCO) of invoice data capture. This is especially true if you have to handle a variety of invoice formats without any means of exporting the information they contain. In this case your only options are manual data entry or copy-pasting invoice details from PDF to Excel files. Both methods will end up being expensive, time consuming, and more prone to errors than a smart tech-enabled solution.
How to convert a PDF invoice to an Excel spreadsheet
Whether you choose a manual data entry method or optical character recognition (OCR) software, you’re going to end up making an investment of time, money, and resources. Consider the short-term and long-term costs of each when you’re evaluating your options.
Manual invoice data capture methods
While we’ve explained why manual data extraction is not the best choice for most AP functions, your business may be at a stage where it will suffice for the short term. Just be aware that in-house or outsourced data entry is not a sustainable option and will become more costly and time-consuming, and potentially less accurate, as you scale your company.
The ‘how-to’ here is obvious: data entry clerks refer to PDF invoices and either type or copy-paste relevant information into Excel spreadsheets used for AP purposes. For a company that processes a lot of invoices, this is a costly option that is more error-prone than automated data capture.
PDF to Excel conversion software
You can choose from a wide range of PDF to Excel conversion programs. Because they’re taking information directly from invoices, they ensure accurate data capture. However, efficiency is highly dependent on the structure of the invoices you’re processing. This software tends to create spreadsheets that require a fair amount of manual tweaking that uses up the time you’re supposed to be saving on data entry.
Some offer monthly subscription-based pricing, others have a one-off single user licence fee. You’re only going to see a return on your investment in one of these solutions if your vendors are sending you PDF invoices with the same structure. Otherwise, members of your AP team will have to spend an excessive amount of time adjusting Excel spreadsheets.
It’s worth noting that the article we linked to above implies that PDF to Excel converters may not be all that reliable, stating in one review that “if you have embedded tables in a PDF document, these will be converted into an Excel spreadsheet (hopefully) without issues.”
PDF to OCR, template-based
Similar to PDF to Excel conversion software, template-based OCR software can extract invoice data and export it to an Excel spreadsheet accurately and quickly. Also like PDF to Excel converters, time and cost savings are contingent on how many invoice formats you’re working with.
This solution spares your AP team the trouble of amending spreadsheets after every conversion. Instead, they have to set up templates and rules for every vendor in your supply chain. Setting up a new template alone can take several hours; therefore, if you’re working with a continuously changing roster of suppliers, a template-based OCR solution may not be the best option for converting PDF invoices to Excel.
PDF to OCR, cognitive invoice data capture
Automated invoice data capture solutions offer the most efficient and cost-effective way to extract invoice data from PDFs to Excel spreadsheets. You can batch convert invoices in a matter of minutes, as you can see in this example:
The second step initially takes some manual effort; fortunately, a cognitive data capture solution learns to recognize data fields and grows more accurate and intuitive with use. You’ll also need to dedicate a little time and resources to the fourth step, as some column widths may require adjustment. Fortunately the cost, time, and resources required for these tasks are marginal compared to the other PDF invoice to Excel conversion options we’ve looked at.
You can start converting PDF invoices to Excel today
The current state of your AP process should help determine how you can export invoice data from a PDF to Excel. If you’re a small business owner with just a few suppliers, manual methods may be the most cost-effective way to extract invoice data.
You could also consider using a free trial version of automated document data capture software. This gives you the opportunity to familiarize yourself with the program so that you’re ready to work with it on a larger scale as your business grows.
1 The oft-quoted Billentis report puts the percentage of completely paperless invoices at 10%. However, its methodology states that supplier and buyer "exchange [invoices] directly via service providers and/or via the platform provided by tax authorities". In other words, that 10% consists of purely EDI invoice receipt, which does not include electronic invoices received in PDF format via email or supplier portal.