Some of the most important data fields in any document are vendor name, customer name, VAT IDs, addresses, and payment information. Such fields usually do not change on the documents from the same vendor, unlike dates, total amounts, etc. Moreover, such fields can be used for uniquely matching master data so that when you are exporting data from Rossum to an ERP, it is already matched to your internal vendor ID or internal PO number.
See how Rossum performs master data matching in action.
How Rossum can help
Rossum’s data matching extension will intake a master data file containing information about your vendors, customers, purchase orders or even line items you need to match with. When capturing data from a new document, Rossum will look up the data from the processed document in the master data file to find the best match and identify the record precisely. This powerful tool helps speed up and even automate document validation without taking up any development resources.

Let’s set this up in Rossum
To access the setup page, find the data matching extension in your trial or production account and open it:
- Open settings
- Navigate to a specific queue
- Click on “Fields to capture” tab
- Scroll to the bottom and click on “Open data matching extension” button

This extension can be found also in the Rossum Store. To find it there:
- Click on Extensions in the main menu
- Find and click on Vendor Matching
- Click Try extension

You would be taken to URL https://data-matching.elis.rossum.ai/ representing the Rossum’s Data Matching extension which is suitable not only for vendor matching but various use cases.
Step 1: Upload master data
Dataset
First, choose a dataset you want your file upload into. If you are uploading data to your very first dataset, write here anything. Typically “vendors” for a table of vendors or “purchase_orders” for a table of POs. Allowed characters are UPPER and lower case letters, numbers and _underscores.
If you want to update data in a previously created dataset, write exactly the same name in the Dataset field as was used previously and later upload a fresh file containing all the data (a complete dataset containing both the old and updated/new data such as a table of all vendors not just the new vendors).

Upload a file
Create a master data file according to your needs. See the sample master data file with companies or purchase orders (PO data). However, you can also define any custom columns, not only the ones defined in the sample file. There is a list of supported columns with defined pre-processing rules and more complex data types.

Encoding
After choosing the master data file, select the encoding of the file.

Unique identifier
The last step is to specify the unique identifier in your master data. That is the name of the column in the uploaded spreadsheet (or element/key in XML/JSON file) that holds the value that should be exported from Rossum when a record is matched. This is typically a unique identifier of a record in your system, e.g. SAP Vendor ID, Code, Supplier ID or similar.

Upload
Click “Upload” and wait for your data to be uploaded. For small files this is basically immediate, for large datasets it may take minutes or even hours in extreme cases. In any case you will get an email when the upload and processing of your data is finished. Your developer may check the status of the upload via API if you feel like it’s taking too long.

After setting up the following steps, the first tab “Upload master data” is the only one you would need to use when just updating data in existing datasets. Therefore, continue reading only if you are setting up a new data matching.
Step 2: Set up a matching field
A matching field is a field in your schema which would appear on the validation screen and display the identified match (or a message that the match was not found). Setting up a matching field is a step you would only need to do once per queue and dataset.

First you need to select a queue where the matching field should be added and dataset which will be matched.

In the Matching “Field definition”:
- Specify the name (label) of the field (if you are matching the vendor, it can be e.g. “Vendor Match”).
- Field ID is automatically generated from the label. It needs to be unique so in case you would choose a label which would get translated into a non-unique field ID, you will need to specify a different label until you find a unique field ID.
Then you need to assign a position to the field in the Extraction schema (= among other fields on your validation screen). Therefore, choose a field from the schema and whether the newly created matching field should be before or after that field.

In the “Additional information to be shown” choose what other information should be shown in the matching field apart from the unique identifier if the matching is successful (i.e. apart from the vendor name you might also want to show its address).

Step 3: Create matching logic
In the Create matching logic tab, first select the queue and dataset where the logic should be applied.

Rules
In the Rules section:
- Select the matching of specific Master data column to Rossum captured field defined in the Extraction schema. Both header and line item fields can be selected as a Rossum captured field.

- Choose if you want Rossum to look for an exact match or for a fuzzy (close enough) match.
- Click on the Tick icon to confirm the rule.
- If you’ve chosen the fuzzy matching technique you can select the similarity threshold to be used. The default is 0.3 which is a 30 % similarity. Setting it to 1.0 would mean a 100 % similarity, basically an exact match. Play around with this threshold to reach ideal matching of your data.
- The matching process will apply the rules following the order you set up. With multiple rules, even if Rossum can’t find the name or the VAT number on the document, it could still successfully extract and match the IBAN, and identify the company from your file.

Choose UX behavior
In the Confirm limitations choose what happens if the match is not found. Rossum could show an error and block the Confirm button until the data is corrected – or it could do nothing and allow the user to act at their discretion – or it could show a warning / just an info but still let user confirm the document.

Step 4: Enable the data matching extension
Click on “Done” button in order to confirm all the rules and enable the extension on the selected queue.
Step 5: See Data Matching in action.
- Once you open Rossum’s validation screen, you’ll notice a new field which you just created (the matching field, for instance “Vendor Match”). This field is a dropdown which will pull the value from the master data file.
- To pull in e.g. the vendor match value, the data matching extension will compare vendor details captured from the document to the master data, following the rules you’ve just set. If the captured data matches the data in the master file, the dropdown will show the matched vendor identifier from master data.
- If some of the vendor details fields weren’t captured correctly and, as a result, the match wasn’t found, you can select the field on the validation screen and place the bounding box over the appropriate value on the document. Filling in a proper value will immediately trigger the extension to try and look up the matching vendor name again.
- Voila! Vendor identity is now verified, you can confirm the document and the correct vendor ID will be passed to the downstream system.

What’s next?
If you are satisfied with the matching, you might like to add another! E.g. now you are matching to your vendor database and you would also like to match to your PO database? Simple. Just define a new dataset (e.g. purchase_orders or POs), upload your master data for POs, set up a matching field and create matching logic. And see your documents being matched both against vendor as well as PO database!

Defining custom matching logic
You do not have to rely on Rossum’s matching logic. You can implement your own matching logic when working with the master data. See our Simple Purchase Order matching extension and get inspired for your own matching use cases.
You can use this feature for free. However, in future it may be limited and there would be a paid version in the Rossum Store as well.
Note: Uploading of master data and other operations related to the data matching can be done over public API endpoint. Contact us at support@rossum.ai in order to learn more.