From this article we will cover:
What is Master Data Hub extension?
The Rossum Master Data Hub extension, an evolution of the Data Matching v2 [BETA], enriches data extracted from documents by comparing it to datasets within the Master Data Hub, matching and enhancing the information to ensure accuracy and alignment with the organization’s authoritative data records, ultimately improving data quality and decision-making.
Getting Started
Starting with the Master Data Hub involves activating it in the Rossum Store, and then diving into the Dataset Management and Matching Configuration Editor interfaces. In Dataset Management, users can upload and manipulate their datasets with tools for adding, updating, and testing data. In the Matching Configuration Editor, developers can craft exact match rules using a simple JSON configuration or script more complex fuzzy matching with MongoDB queries for a fine-tuned approach to data alignment.

Dataset Management
Master Data Hub’s Dataset Management enables you to upload your master data in supported formats such as JSON, XML, CSV, and XLSX. The UI simplifies dataset management – allowing you to add, update, or delete datasets as needed. You can perform Find
and Aggregate
operations to test and refine your queries directly within the application, ensuring your datasets are primed for use.

- Adding a Dataset
- Click on ‘Add Dataset’ to import new data.
- Choose the file from your computer and upload it in one of the supported formats.
- Managing Uploaded Data
- The list of datasets will appear on the left panel, where you can select and manage them.
- Options to ‘Replace’, ‘Update’, or ‘Delete’ are available, enabling you to maintain current and accurate data easily.
- Testing Your Queries
- The Query section allows you to write and test queries against your datasets.
- Utilize the
Find
andAggregate
pipelines to execute different operations and view results in real-time. - The ‘Try’ button lets you test queries immediately, ensuring they work as expected before applying them to the matching process.
Matching Configuration Editor
The Matching Configuration Editor in the Master Data Hub is where you not only define how to match data but also determine the subsequent actions based on the outcome of the match. It caters to both simple exact matching and complex fuzzy matching needs, allowing for extensive data accuracy improvements. This multi-faceted tool allows you to set precise behaviors for various matching scenarios, ensuring that data is handled appropriately in your Rossum environment.

Matching Queries
When configuring data matches in the Master Data Hub, you have the flexibility to define precise conditions tailored to your data. For exact matching scenarios, simple JSON structures articulate the rules, linking extracted document fields directly to corresponding dataset entries. Here’s where pure accuracy is key, and the match must be spot on. On the other hand, fuzzy matching embraces the complexity of real-world data variance, utilizing MongoDB queries to forge connections where exact matches falter. This dual approach ensures that whether your data is perfectly aligned or slightly askew, the Master Data Hub has the capability to find and link the right information.

Exact Matching
For exact matches, specify your conditions using simple JSON configuration. This involves defining the “find” key followed by the criteria in a JSON object, as shown in the example:
[
{
"find": {
"Vendor Name": "{sender_name}"
}
}
]
This example demonstrates how to match a vendor name (from Vendor Name
column) exactly as it appears in the dataset with the sender_name
value extracted from the document.
Fuzzy Matching
When exact values are unreliable, fuzzy matching becomes essential. For this, MongoDB queries are incorporated into your JSON configurations:
[
{
"aggregate": [
{
"$search": {
"text": {
"path": "Vendor Name",
"query": "{sender_name}",
"fuzzy": {
"maxEdits": 1
}
}
}
}
]
}
]
The above MongoDB query accommodates discrepancies in the vendor name by allowing up to one edit, which covers minor typos or variations in the name.
Displaying the Result
Here, you can specify the rules for matching document data with your dataset.

- Target Schema ID: Designate where the match result will be stored.
- Dataset Key: Identify which attribute from the dataset is returned as the enum value upon a successful match.
- Label: Create a user-friendly display label for the matched result, combining keys and additional text if necessary. Make sure wrap dataset column names in quotes. Example:
{"Vendor Name"} - {"Vendor Address"}
Important Note: Make sure the target field exists in your schema and it has the Enum data type. You can add the field by navigating to Queue settings > Fields to capture and adding the field to the desired section of your schema.

Post-Match Workflow Configurations
Once a match is made—or not made—you can control what happens next.
The Default Value – When no matches are found
If no match is found, you can specify a default value and label to be displayed in the target annotation field. For example, setting the label to ‘No match found’ provides clear feedback for users when there’s no corresponding data in the master dataset.

Result Actions – Customizing Feedback Messages
Tailor the response for three distinct outcomes: no match, one match, or multiple matches. For each case, you can choose to display the default value, show the best matching result, or trigger a specific action like displaying a custom message. Messages can vary in type from simple information to warnings or even errors that block further processing until resolved.

- No Match Found
- Choose to display the default value.
- Accompany it with an error message to block processing, alerting users that no match was found and action is required.
- One Match Found
- Opt to display the best matching result.
- You can decide to have no message, indicating a smooth process, or provide additional context or confirmation.
- Multiple Matches Found
- You might display the default value or ask the user to select from the matched results.
- Implement a warning message to highlight the need for user intervention in selecting the most accurate match from multiple possibilities.
Complex Matching Example
The “Comprehensive Matching Across Fields” example leverages both exact and fuzzy matching to scrutinize multiple data points for accurate vendor identification. It combines two queries within one pipeline. The queries are executed in the defined order and the results of the first successful query are returned.
The “find” query attempts to identify the correct vendor by IBAN. If a vendor with matching IBAN is found, the query returns the result(s) and no further matching queries are executed.
The second query is an “aggregate” query utilizing the “search” operator. The “must” clause of the “search” operator is the backbone, requiring a close match on “sender_name” field, with a one-edit leeway for small errors. The “should” clause is the enhancer, boosting the match likelihood if “sender_address” field aligns with the dataset. The “limit” method is used to return only the 3 best results to give the user freedom to select the correct record in case there are multiple matching results.
This dual-structured approach ensures precise vendor matches even when data entries have minor inconsistencies, streamlining the matching process in Rossum.
[
{
"find": {
"IBAN": "{iban_normalized}"
}
},
{
"aggregate": [
{
"$search": {
"compound": {
"must": [
{
"text": {
"path": "Vendor Name",
"query": "{sender_name}",
"fuzzy": {
"maxEdits": 1
}
}
}
],
"should": [
{
"text": {
"path": "Vendor Address",
"query": "{sender_address}",
"fuzzy": {
"maxEdits": 1
}
}
}
]
}
}
},
{
"$limit": 3
}
]
}
]
Automate Your Master Data Import
Further enhance your data matching with “Import Master Data From SFTP” and “Import Master Data From S3” extensions. Automatically import and synchronize your master data from external storage solutions directly into the Master Data Hub, maintaining up-to-date records with little manual effort.
Conclusion
Whether your data matching needs are straightforward or require the nuanced approach of fuzzy logic, the Master Data Hub extension provides the tools necessary for precise data validation and enrichment. For developers and administrators seeking to leverage these advanced data matching techniques, Rossum’s support is available to help you navigate and maximize the potential of the Master Data Hub extension.