How to set up the Value Transformations extension from the Rossum Store

Very often values printed on documents are not in the desired format or contain unwanted characters such as spaces, commas, colons, etc. Given this, it may be beneficial to transform the values extracted by the AI instead of using the original ones contained in the document.

This is most commonly seen with the removal of non-alphanumerical characters from values such as the VAT, IBAN, and account number.

Value Transformations is a configurable extension that can help with this process by offering powerful configurable string manipulation features that use regular expressions to replace chosen string patterns automatically.

Advanced users can also chain defined transformations to cover more complex cases when one regular expression is insufficient. However, advanced technical expertise isn’t required to use the extension.

Configuration examples for common use cases

Given that it might be complicated to define transformation rules just by using the description of the available parameters (found later in this article), here are some configuration examples that can be copied and modified for an easier time setting up the extension.

Please note that backslashes in regular expressions must be escaped (doubled) in the extension configuration of the Rossum UI. The example configuration below already contains escaped regular expressions.

Removal of non-alphanumeric characters

The Value Transformations extension with the configuration below removes all non-alphanumeric characters from the Vendor VAT Number and IBAN fields.

Example:

  • Input: DE 12345-6789
  • Output: DE123456789
{
  "actions": [
    {
      "transformations": [
        {
          "pattern_to_replace": "[^a-zA-Z\\d]",
          "value_to_replace_with": "",
          "replace_if_this_pattern_matches": "[^a-zA-Z\\d]"
        }
      ],
      "source_target_mappings": [
        [
          "sender_vat_id",
          "sender_vat_id_normalized"
        ],
        [
          "iban",
          "iban_normalized"
        ]
      ]
    }
  ]
}

Extracting and normalizing part of the line item description

Value Transformations with the configuration below uses two chained transformations to extract and normalize item code from the item description.

The first transformation removes everything after the first space character in the string. The second one removes all hyphens from the result of the first transformation.

Notice also that there is an action condition defined in this configuration. This action will only be performed when the Vendor Name is “Lacte“. The condition is optional.

Example:

  • Input: 1234-567-89 This is a line item description with the code at the beginning.
  • Output: 123456789
{
  "actions": [
    {
      "transformations": [
        {
          "pattern_to_replace": " ([\\s\\S]*)$",
          "value_to_replace_with": "",
          "replace_if_this_pattern_matches": " ([\\s\\S]*)$"
        },
        {
          "pattern_to_replace": "-",
          "value_to_replace_with": "",
          "replace_if_this_pattern_matches": "-"
        }
      ],      "action_condition": {
        "value": "Lacte",
        "schema_id": "sender_name"
      },
      "source_target_mappings": [
        [
          "item_description",
          "item_code"
        ]
      ]
    }
  ]
}

Setting up the extension

Setting up the extension itself takes a few simple steps:

  1. Prepare your queues and schemas
  2. Activate Value Transformations in the Rossum Store
  3. Specify the queue(s) the extension is going to be used for
  4. Set up the actions and transformations

Step 1: Prepare your queues and schemas

The first step is identifying the queue(s) with the documents that require Value Transformations. Once that’s done, identify the schema IDs of the fields that will contain the extracted values set to be transformed by the extension.

If using the Dedicated Engine, make sure to create new schema fields that will store the results of the transformations (see the info panel below). If the Generic Engine is being used, configure the extension to modify the value of the field “in-place“ (same source and target field).

Please note that by using the Dedicated Engine and configuring the extension to modify the value of a particular field, the results of the accuracy calculation for that field will be significantly lower compared to the real accuracy. To avoid this, modifying the values extracted by the AI and OCR manually or programmatically is not recommended when using the Dedicated Engine.

Step 2: Activate Value Transformations in the Rossum Store

In order to activate Value Transformations, go to the Rossum application and:

  1. Click the Extensions button in the main menu and you will be taken to the Rossum Store.
  2. Once in the Rossum Store section, Value Transformations should be visible. If not, click “See all”.
  3. Click the “Value Transformations” extension tile.
  4. Click “Try extension”.

Step 3: Specify the queue(s) the extension is going to be used for

Once in the “Rossum Store extension settings”, scroll down to “Queues” and select the queue(s) that the extension should be used for.

Step 4: Set up the actions and transformations

The extension is configured through the configuration field in the UI or by using the settings attribute of the hook API object. The configuration is in JSON format (see the description of the available parameters below).

This configuration consists of a list of actions that can work with values from different fields in the schema. Each action has a set of transformations, source/target field definitions, and the condition under which the action will be performed.

The code from examples above can be pasted here in case of a matching use case. Then, only step needed is a simple replacement of the fields whose values are to be transformed.

The full list of available parameters is shown below.

RootParam nameDescription
 actionsList of actions to be performed by the extension. Description of the action parameters is shown below.
actionssource_target_mappingsList of source and target field schema ids. Each pair of the fields is a small list containing two strings (see example below).
actionstransformationsList of transformations to be performed on the value of the source field. See description of the transformation parameters below.
actionsqueue_idID of the queue where the particular action should be performed. It is possible to assign the extension to multiple queues and specify multiple actions for different queues in one instance.
This parameter is optional. If it is not present in the configuration, then the action will be performed on all the queues that the extension is assigned to.
actionsaction_conditionDefinition of a condition for a particular action. If defined, the action will only be performed if the value in the field defined in the schema_id equals the value it’s configured for in the value parameter of the condition.
action_conditionschema_idSchema ID of the field used in the condition.
action_conditionvalueValue which will be compared to the value of the field defined in the schema_id parameter of the condition. This action will only be performed if the defined value and the field value match.
transformationspattern_to_replaceRegular expression which defines a pattern in the value to be found and replaced. See python regular expressions for details
transformationsvalue_to_replace_withThe value which will replace all occurrences of the pattern matching the regular expression defined in the pattern_to_replace parameter.
transformationsreplace_if_this_pattern_matchesRegular expression which defines the condition for a transformation to be applied. The transformation will only be applied if the value matches the expression. See python regular expressions for details
Automate data extraction from your documents with Artificial Intelligence.
Free trial