How to set up the Value Transformations extension from the Rossum Store

Very often, values printed on documents are not in the desired format or contain unwanted characters such as spaces, commas, colons, etc. Given this, it may be beneficial to transform the values extracted by the AI instead of using the original ones contained in the document. It is most commonly seen with the removal of non-alphanumerical characters from values such as the VAT, IBAN, and account number.

Value Transformations is a configurable extension that can help with this process. It offers powerful, configurable string manipulation features that use regular expressions to replace chosen string patterns automatically.
Advanced users can also chain defined transformations to cover more complex cases when one regular expression is insufficient. However, advanced technical expertise isn’t required to use the extension.

Configuration examples for common use cases

It might be complicated to define transformation rules just by using the description of the available parameters (found later in this article). Here are some configuration examples that can be copied and modified for an easier time setting up the extension.


Please note that backslashes in regular expressions must be escaped (doubled) in the extension configuration of the Rossum UI. The example configuration below already contains escaped regular expressions.

Removal of non-alphanumeric characters

The Value Transformations extension with the configuration below removes all non-alphanumeric characters from the Vendor VAT Number and IBAN fields.

Example:

  • Input: DE 12345-6789
  • Output: DE123456789
{
  "actions": [
    {
      "transformations": [
        {
          "pattern_to_replace": "[^a-zA-Z\\d]",
          "value_to_replace_with": "",
          "replace_if_this_pattern_matches": "[^a-zA-Z\\d]"
        }
      ],
      "source_target_mappings": [
        [
          "sender_vat_id",
          "sender_vat_id_normalized"
        ],
        [
          "iban",
          "iban_normalized"
        ]
      ]
    }
  ]
}

Extracting and normalizing part of the line item description

Value Transformations with the configuration below use two chained transformations to extract and normalize item code from the item description.
The first transformation removes everything after the first space character in the string. The second one removes all hyphens from the result of the first transformation.
Notice also that there is an action condition defined in this configuration. This action will only be performed when the Vendor Name is “Lacte“. The condition is optional.

Example:

  • Input: 1234-567-89 This is a line item description with the code at the beginning.
  • Output: 123456789
{
  "actions": [
    {
      "transformations": [
        {
          "pattern_to_replace": " ([\\s\\S]*)$",
          "value_to_replace_with": "",
          "replace_if_this_pattern_matches": " ([\\s\\S]*)$"
        },
        {
          "pattern_to_replace": "-",
          "value_to_replace_with": "",
          "replace_if_this_pattern_matches": "-"
        }
      ],      "action_condition": {
        "value": "Lacte",
        "schema_id": "sender_name"
      },
      "source_target_mappings": [
        [
          "item_description",
          "item_code"
        ]
      ]
    }
  ]
}

Setting up the extension

Setting up the extension itself takes a few simple steps:

  1. Prepare your queues and schemas
  2. Activate Value Transformations in the Rossum Store
  3. Specify the queue(s) the extension is going to be used for
  4. Set up the actions and transformations

Step 1: Prepare your queues and schemas

The first step is identifying the queue(s) with the documents that require Value Transformations. Then identify the schema IDs of the fields containing the extracted values set to be transformed by the extension.

If you use the Dedicated Engine, create new schema fields that store the results of the transformations (see the info panel below). If you use the Generic Engine, configure the extension to modify the value of the field “in-place“ (same source and target field).

Please note that by using the Dedicated Engine and configuring the extension to modify the value of a particular field, the results of the accuracy calculation for that field will be significantly lower compared to the real accuracy. To avoid this, modifying the values extracted by the AI and OCR manually or programmatically is not recommended when using the Dedicated Engine.

Step 2: Activate Value Transformations in the Rossum Store

In order to activate Value Transformations, go to the Rossum application and:

  1. Click the Extensions button in the main menu to open the Rossum Store.
  2. Once in the Rossum Store section, you will see the “Value Transformations” extension tile.
  3. Click on it.
  4. Click “Try extension.”
Value Transformation Extension in the Rossum Store

Step 3: Specify the queue(s) the extension is going to be used for

Once in the “Rossum Store extension settings,” scroll down to “Queues” and select the queue(s) that to which you want to add the extension.

Step 4: Set up the actions and transformations

The extension is configured through the configuration field in the UI or by using the settings attribute of the hook API object. The configuration is in JSON format (see the description of the available parameters below).

This configuration consists of a list of actions that can work with values from different fields in the schema. Each action has a set of transformations, source/target field definitions, and the condition under which the action will be performed.

You can paste the code from the examples above in case of a matching use case. Then, the only step needed is simply replacing the fields whose values are to be transformed.
You can see the complete list of available parameters below.

RootParam nameDescription
 actionsList of actions to be performed by the extension. Description of the action parameters is shown below.
actionssource_target_mappingsList of source and target field schema ids. Each pair of the fields is a small list containing two strings (see example below).
actionstransformationsList of transformations to be performed on the value of the source field. See description of the transformation parameters below.
actionsqueue_idID of the queue where the particular action should be performed. It is possible to assign the extension to multiple queues and specify multiple actions for different queues in one instance.
This parameter is optional. If it is not present in the configuration, then the action will be performed on all the queues that the extension is assigned to.
actionsaction_conditionDefinition of a condition for a particular action. If defined, the action will only be performed if the value in the field defined in the schema_id equals the value it’s configured for in the value parameter of the condition.
action_conditionschema_idSchema ID of the field used in the condition.
action_conditionvalueValue which will be compared to the value of the field defined in the schema_id parameter of the condition. This action will only be performed if the defined value and the field value match.
transformationspattern_to_replaceRegular expression which defines a pattern in the value to be found and replaced. See python regular expressions for details
transformationsvalue_to_replace_withThe value which will replace all occurrences of the pattern matching the regular expression defined in the pattern_to_replace parameter.
transformationsreplace_if_this_pattern_matchesRegular expression which defines the condition for a transformation to be applied. The transformation will only be applied if the value matches the expression. See python regular expressions for details
Automate data extraction from your documents with Artificial Intelligence.
Free trial