In this article, you will learn:
- What is the Datasets Import from File Storage to Master Data Hub extension
- What are the common use case configurations
- How to set up the extension
- What are the available configuration parameters
What is the Datasets import from File Storage to Master Data Hub extension
Make sure your matching data always stays up-to-date. This service enables the automatic import of datasets from your file storage, like SFTP or Amazon S3, to the Master Data Hub extension. It regularly connects to your file storage, downloads dataset files from a specified location, and imports them to the Master Data Hub extension. The following article will guide you through the process of how to set up and utilize this service.
How to set up the Dataset import from File Storage to Master Data Hub extension
Setting up the extension takes just a few simple steps.
Step 1: Prepare your file storage and the source folder (where dataset files will be located)
Make sure access permissions and source folder(s) are created.
Step 2: Activate your Dataset Import from File Storage to Master Data Hub extension
- Click on the Extensions tab at the top of the app.
- Choose the Rossum store option to display all the available extensions.
- Select one of the following extensions based on your file storage system:
- Import Master Data From SFTP
- Import Master Data From S3
- Click “Try extension.”
Step 3: Setup the schedule
Define the frequency of your dataset import job. The schedule definition uses crontab expressions. You can evaluate your expression here.
Here are some examples of commonly used schedules:
- Once per day: “0 0 * * *”
- Twice per day (every 12 hours): “0 */12 * * *”
- Each hour: “0 * * * *”
See this page for more examples.
Step 4: Configure credentials and secrets to your file storage
Credentials and secrets are used to define the type of your file storage – SFTP, Amazon S3 – and to provide the extension with data necessary to access the storage. Different credentials and secrets are needed for different types of file storage – refer to the examples below.
Credentials are defined as part of the hook configuration.
Secrets are defined and safely stored in the extension configuration screen in the Secrets section.
You can add your secrets in the Secrets section edit component. When you click the Save Changes button on the configuration screen, your secrets are stored in the function but are not visible anymore for security reasons. Instead, the “__change_me__” string is shown.
SFTP Credentials
{
"credentials": {
"type": "sftp",
"host": "",
"port":,
"username": ""
}
}
SFTP Secrets
{
"type": "sftp",
"password": ""
}
S3 Credentials
{
"credentials": {
"type": "s3",
"bucket_name": ""
}
}
S3 Secrets
{
"type": "s3",
"access_key_id": "",
"secret_access_key": ""
}
Step 5: Define rules for dataset import
You can import datasets in two ways – either replace the entire dataset entirely with the content of the imported file or update the existing dataset by adding new and updating existing records into it – this is configured in the JSON configuration section import_rules, subsection import_methods. Additionally, upon successful or unsuccessful import, an action can be specified to configure what should be done with the imported file – this is configured in the JSON configuration section import_rules
, subsection result_actions
.
Update method example
{
"update_method": {
"path": "",
"file_match_regex": "",
"file_format": "",
"id_keys": []
}
}
Replace method example
{
"replace_method": {
"path": "",
"file_format": ""
}
}
Common use case configurations
Regularly replace your datasets
This feature replaces the suppliers dataset by importing data from your file storage’s “/datasets” folder. This basic example uses a regular expression to import only file(s) of specific names using regular expression.
{
"credentials": { # Credentials are set in a format appropriate for the target system
},
"import_rules": [
"dataset_name": "suppliers",
"import_methods": {
"replace_method": {
"path": "/datasets",
"file_match_regex": "suppliers-.*\\.csv",
"file_format": "csv",
}
}
]
}
Regularly update your datasets
This feature updates the suppliers dataset by importing data from your file storage’s “/datasets” folder. Column name supplier_id is used for updating existing records. This example also defines actions that will be performed after the import. If the import is successful, the original file is deleted. If the import fails, the original file is moved to “/datasets_failed” folder.
{
"credentials": { # Credentials are set in a format appropriate for the target system
},
"import_rules": [
"dataset_name": "suppliers",
"import_methods": {
"update_method": {
"path": "/datasets",
"file_match_regex": "suppliers-.*\\.csv",
"file_format": "csv",
"id_keys": [
"supplier_id"
]
}
},
"result_actions": {
"success": [
{
"type": "delete"
}
],
"failure": [
{
"type": "move",
"path": "/datasets_failed"
}
]
}
]
}
API Documentation
Please refer to the detailed documentation here. Under the subsection settings, you will find the complete description of available parameters and configuration specifics for different currently supported file storage systems.