From this article you will learn:
- Where to add new data field
- How to add new data field
- How to create pre-defined data field value captured by the AI Engine
- How to create custom data field not captured by the AI Engine
- How to populate a data field value from email data
- How to populate a data field value from an API upload
Rossum’s AI Engine can capture a predefined set of data fields from the very beginning. You will see those fields when you create a new queue from a given regional extraction schema template.
However, it is a common case that you need to create a new custom data field that you will extract from the documents. For this article, let’s assume you want to add a new field called “Container Number.”
Where to add new data field?
If you prefer the Rossum schema editor, read this article. You will learn how to add a new field and set up the extraction schema in Rossum. Here we will show how to add a new field in the JSON code editor.
To open the JSON code editor, go to a selected queue and open its Settings. There you need to click on “Fields to capture” and then click on the “Edit JSON” button.
Once in the code editor, navigate to the section where you want to add the new field. The easiest way is to click on the section’s label in the left sidebar of the JSON code editor. Your window will scroll to the configuration of the data field automatically.
When you locate the section, you can add new data fields to the section’s “children” attribute.

How to add new data field
You can either copy and update an existing field from the JSON file or create a new field from scratch. In both cases, every data field has a set of attributes that have to be defined:
label
– the field name visible for annotators on the validation screen and in the export results.id
– a unique field identifier used in exports and integrations.type
– data type. You can have string, number, date, enum, multivalue field, or table. More information here.format
– available for two types of data – date, and number.required
– if the field is required, you can’t confirm the document without it.visible
– you can decide whether the field is visible or hidden on the validation screen. If you hide it, it will not be removed from the schema and Rossum can still extract the value.rir_field_names
– this determines which value should be presented in a field. For example, our Accounts Payable and Receivable AI engine can recognize specific fields (full list here). If you use this engine and want to capture bank account numbers, you should create an “Account number” field and set “account_num” as therir_field_names
to let the engine know what value you expect to get in this field.default_value
– the value that will be shown if no prediction is made for a field. It acts as a backup in case predictions are missing or if certain information cannot be extracted from the document. The default value is used for fields that don’t use hints from AI engine predictions (whenrir_field_names
are not specified) or when the AI engine doesn’t provide any data for the field.category
– category of an object. Possible options are section, multivalue, tuple and datapoint.

The data necessary for creating our “Container Number” field would be as follows:
{
"rir_field_names": [],
"category": "datapoint",
"id": "sap_container_number",
"label": "Container Number",
"type": "number",
}
Depending on the type of data field, you might be required to define other necessary attributes. The “Datapoint” section in our API documentation describes the individual data field types and their attributes.
Once you define all the required attributes, and the JSON code editor is not showing any errors, you should see the new field in the left sidebar, which mirrors what users will see in the validation screen.
Pre-defined data field value captured by the AI Engine
As mentioned, AI Engine can pre-fill the data field’s value. In such a case, you have to map your data field to one of the outputs of the AI Engine. Then our Generic AI Engine can predict the field you are trying to set up. Check the list of the out-of-the-box captured fields before mapping a data field to an output. You should always map a field to an output that Rossum can extract.
If you find a relevant output that you could use for pre-filling your data field, then you should do the following:
- Create a new attribute in your field – "rir_field_names": [].
- Fill the list with the AI Engine ID, for example:
"rir_field_names": ["sender_name"]
. - If multiple AI Engine IDs can fill the value, use a comma to separate them: "rir_field_names ":["sender_name", "sender_ic"]. Captured values will be used according to the specified sequence of the AI Engine output IDs.
However, in the case of the “Container Number” data field, Rossum’s Generic AI Engine does not offer the capture of such a field.
Custom data field not captured by the AI Engine
If Rossum’s Generic AI Engine does not capture the field you need by default, then you should read about the Dedicated AI Engine. We can fine-tune it for better extraction accuracy for your documents and fields.
If Rossum cannot capture the field you need and you have not signed up for the Dedicated AI Engine, then the attribute for pre-filling the data field should stay empty – "rir_field_names":[].
How to populate a data field value from email data
If you send documents to Rossum via email, then you can reuse some of the values and use them to initialize the data fields.
A typical example is when your vendors send documents directly to your Rossum email inbox. In that case, you might want to pre-fill the “Container Number” field from the email’s subject text received in Rossum’s email inbox attached to the queue. Then you could set up the data field in the following manner: "rir_field_names" :[ "email_header:subject"].
Read more about the available email data in the API documentation.
How to populate a data field value from an API upload
Suppose you upload data via the API and already know the container number from your system based on how you received the document. You can send this information via API during upload to allow easier tracking between “Container Number” and related documents. You would have to set up the field like this: "rir_field_names" :[ "upload:id"].