From this article you will learn:
- What is Data Matching extension
- How to set up Data Matching extension
- How to create another Data Matching with different dataset
- How to define custom matching logic
What is Data Matching extension
What exactly is Data Matching extension, and when can it be useful for you and your workflow?
The most important data fields in any document are the vendor name, customer name, VAT IDs, addresses, and payment information. These fields usually stay the same across documents from the same vendor, unlike dates, total amounts, etc. Moreover, you can use them for unique matching master data. So when you export data from Rossum to an ERP system, it is already matched to your internal vendor ID or internal PO number.
See how Rossum performs master data matching in action:
So how does it work? First, you upload a master data file to the Data Matching extension. It can contain information about your vendors, customers, purchase orders, or even line items you want to match. When capturing data from a new document, Rossum searches the master data file for the data from the processed document. Then it finds the best match and accurately identifies the record. This powerful tool helps speed up and even automate document validation without consuming development resources.

How to set up the extension
Although the Data Matching extension can help you with many things, setting it up takes only a few steps.
How to access the setup page
You can access the setup page via the Queue configuration:
- Navigate to a specific queue.
- Open queue’s Configuration.
- Click on the “Fields to capture” tab.
- Scroll to the bottom and click on the “Open data matching extension” button.

You can also find this extension’s setup page via the Rossum Store. To find it there:
- Click on Extensions in the main menu.
- Find and click on Data Matching.
- Click the Try extension button.

In both cases, you will be taken to the setup view (https://data-matching.elis.rossum.ai/) representing Rossum’s Data Matching extension, which is suitable for vendor matching and other various use cases.
Step 1: Upload master data
Dataset
You can enter whatever you want when you upload data to your first dataset. Usually, you will use “vendors” for a table of vendors and “purchase_orders” for a table of POs. Allowed characters are upper and lowercase letters, numbers, and _underscores.
If you want to update data in a previously created dataset, use the same name in the Dataset field as before. Then upload a new file containing all the data (a complete dataset that includes both the old and the updated/new data, such as a table with all vendors, not just the new ones ).

Upload a file
Create a master data file according to your needs. See the sample master data file with companies or purchase orders (PO data). You can also define custom columns, not only those in the sample file. There is a list of supported columns with defined pre-processing rules and more complex data types.
The data file format should be .json, .xml, .csv, .xls or .xlsx.

After choosing the master data file, select the encoding of the file.

The last step is to set the unique identifier in your master data. It is the column’s name in the uploaded spreadsheet (or element/key in the XML/JSON file) that holds the value to be exported from Rossum when a record is matched.
It is usually a unique identifier of a record in your system, such as an SAP Vendor ID, Code, Supplier ID, or something similar.

Click “Upload” and wait for your data to be uploaded. For small files, this is immediate; for larger datasets, it may take minutes or even hours in extreme cases. In any case, you will get an email when your data is uploaded and processed. Your developer may check the upload status via API if you feel it’s taking too long.

After completing the preceding steps, the first tab “Upload master data” will be the only one you need to use when simply updating data in existing datasets.
Continue reading if you want to create a new data matching.
Step 2: Set up a matching field
What exactly is a matching field? It’s a field in your schema that you can see on the validation screen. It shows the identified match (or a message that the match was not found). Setting up a matching field is a one-time step per queue and dataset.
First, you must select a queue where Rossum should add the matching field and dataset you want to use.

Next, move to the “Field definition” section:
- Specify the field’s name (label). If you want to match the vendor, it can be, e.g., “Vendor Match”.
- The Data Matching automatically generates the field ID from the label. It must be unique. So if you choose a label that would translate to a non-unique field ID, you will need to specify a different label until you find a unique field ID.
Then assign the field’s position in the Extraction schema (= among other fields on your validation screen). Choose a field from the schema and specify whether the new matching field should come before or after it.

Then scroll down to the “Additional information to be shown” part. You can specify what information, other than the unique identifier, should be displayed in the matching field if the match is successful. For example, you might want to include the vendor’s address as well as its name.

Do not forget to save your progress by clicking on the “Save” button at the bottom of the screen.
Step 3: Create matching logic
In the “Create matching logic” tab, select the queue and dataset where the extension should apply the logic.

In the Rules section:
- Select the matching of a specific Master data column to the Rossum captured field defined in the Extraction schema. You can select both header and line item fields as a Rossum captured field.

- Choose if you want Rossum to look for an exact match or a fuzzy (close enough) match.
- Click on the Tick icon to confirm the rule.
If you’ve chosen the fuzzy matching technique, you can select the desired similarity threshold. The default value is 0.3, which is a 30 % similarity. Setting it to 1.0 means a 100% similarity, which is an exact match. Play around with this threshold to reach the ideal matching of your data.
The matching process will apply the rules following the order you set up. With multiple rules, even if Rossum can’t find the name or the VAT number on the document, it could still successfully extract and match the IBAN and identify the company from your file.

Choose UX behavior
In the Confirm limitations choose what happens if the match is not found. Rossum could show an error and block the Confirm button until the data is corrected – or it could do nothing and allow the user to act at their discretion – or it could show a warning / just an info but still let user confirm the document.

Step 4: Enable the extension
Click on the ” Save ” button to confirm all the rules and enable the extension on the selected queue.
Step 5: See the extension in action.
You’ll notice a new field you just created when you open Rossum’s validation screen (the matching field, e.g., “Vendor Match”). It is a dropdown that retrieves the value from the master data file.
For example, to pull the vendor match value, the Data Matching extension compares the vendor details captured from the document to the master data, following the rules you just set. If the captured data matches the data in the master file, the dropdown list displays the matched vendor identifier from the master data.
Suppose Rossum didn’t capture some vendor details fields correctly, and the match wasn’t found. In that case, you can select the field on the validation screen and place the bounding box over the appropriate value on the document. Filling in a correct value will immediately trigger the extension to try to find the matching vendor name again.
Voila! Vendor identity is now verified, you can confirm the document and the correct vendor ID will be passed to the downstream system.

Create another matching with new dataset
If you are satisfied with the matching, you might like to add another! E.g. now you are matching to your vendor database and you would also like to match to your PO database? Simple. Just define a new dataset (e.g. purchase_orders or POs), upload your master data for POs, set up a matching field and create matching logic. And see your documents being matched both against vendor as well as PO database!
If you are satisfied with the matching, consider adding another one! For example, you might want to match to your vendor database as well as your PO database.
This is quite simple. Just define a new dataset (e.g. purchase_orders or POs), upload your master data for POs, set up a matching field, and create a matching logic. Then watch your documents get matched against the vendor and PO database!

How to define custom matching logic
You do not have to rely on Rossum’s matching logic. You can implement your own matching logic when working with the master data. See our Simple Purchase Order matching extension and get inspired for your matching use cases.
You can use this feature for free. However, it may be limited in the future, and there will be a paid version in the Rossum Store as well.
Note: Note: You can upload master data and other operations related to the data matching over public API endpoint. Contact us at support@rossum.ai to learn more.