Rossum’s AI Engine can capture a predefined set of data fields from the very beginning. You will see those fields when you create a new Rossum account or a new queue from a given regional extraction schema template.
However, it is a common case that you need to create a new custom data field that you will be extracting from the documents. For the purpose of this article, let’s assume you want to add a new field called “Container Number”.
Where can I add new data field?
In order to add a new data field, navigate to the Rossum schema editor. Once you are in the schema editor, you navigate to the section where you would like to add the new field. The easiest way is to click on the label of the section in the left sidebar of the schema editor. Your window will scroll to the configuration of the data field automatically.
Once the section is located, you can start adding new data fields to the "children"
attribute of the section which represents a list of data fields that are available in the section.

Adding a new data field
You can either copy and update an existing field from the JSON file or create the new field from scratch. In both cases, every data field has a set of required attributes which have to be defined:
- category – every object in the extraction schema has its own category. For a new data field the category would defined as “datapoint”.
- label – the name of the field users will see on the validation screen.
- id – a unique identifier of the data field. The ID is used to map the internal ID of the field in the target system (ERP/AP) to Rossum. For example, when you export data to SAP, where the ID for “Container Number” is “sap_container_number”, you could set the same ID in Rossum for this field. Then you can easily download the data and upload it to SAP since the ID of the data field will be the same. This is much more convenient than using label for the pairing of the data fields. Even if you would later change the label of the data field to “Container ID” to make it more understandable for your operators, your integration wouldn’t break and Rossum could still use the previously extracted values for teaching the AI, showing corrections to the data field in the Usage Reporting Dashboard, etc.
- type – a data field can be of type:
- String (“Rossum”)
- Number (123.0)
- Date (1/1/2020)
- Enum ([“A”, “B”, “C”])
- rir_field_names – a list of mapped AI Engine outputs. If a field cannot be mapped to any output that Rossum can extract by default or that you have already had trained during a Dedicated AI Engine training, leave the list empty (
"rir_field_names": []
).

Creating a sample field
The data necessary for creating our “Container Number” field would be as follows:
{
"category": "datapoint",
"type": "number",
"label": "Container Number",
"id": "sap_container_number",
"rir_field_names": []
}
Depending on the type of the data field, you might be required to define other necessary attributes. The “Datapoint” section in our API documentation describes the individual data field types and their attributes.
Once all the required attributes are defined and the schema editor is not showing any errors, you should be able to see the new field in the left sidebar which mirrors what users will see in the validation screen.
Populating the default data field value by the AI Engine
As we already mentioned, the value of the data field can be pre-filled by the AI Engine. In such a case, you have to map your data field to one of the outputs of the AI Engine. The field you are trying to set up can be already predicted by our Generic AI Engine. Check the list of the out-of-the-box captured fields before mapping a data field to an output. A field should always be mapped to an output that Rossum can extract by default.
If you find a relevant output that could be used for pre-filling your data field then you should do the following:
- Create a new attribute in your field –
"rir_field_names": []
. - Fill the list with the AI Engine ID
"rir_field_names": ["sender_name"]
. - If the value can be filled with multiple AI Engine IDs
"rir_field_names":["sender_name", "sender_ic"]
. Possibly captured values will be used according to the specified sequence of the AI Engine output IDs.
However, in the case of the “Container Number” data field, Rossum’s Generic AI Engine does not offer the capture of such a field by default.
Custom field not captured by the Rossum’s AI Engine
If the Rossum’s Generic AI Engine does not capture the field you need by default then you should read about the Dedicated AI Engine, which can be fine-tuned for better extraction accuracy for your documents and fields.
If Rossum cannot capture the field you need and you have not signed up for the Dedicated AI Engine then the attribute for pre-filling the data field should stay empty – "rir_field_names":[]
.
Populating a data field value from email data
If you are sending documents to Rossum via email then you can reuse some of the values and use them for initialization of the data fields. A typical example would be if your vendors are sending documents directly to your Rossum email inbox.
You might want to pre-fill the “Container Number” field from the email’s subject text received in the Rossum’s email inbox attached to the queue.
In such a case you could set up the data field in the following manner: "rir_field_names":["email_header:subject"]
.
Read more about the available email data in the API documentation.
Populating a data field value from an API upload
If you are uploading data via the API and you already know the container number from your system based on how you received the document, you can send this information via API during upload to allow easier tracking between “Container Number” and its related documents.The field would be set up like this: "rir_field_names":["upload:id"]
.