Under the Hood: Rossum Product Improvement News
July 2019

You can see a comprehensive list of the change log from June and all other updates from 2019.



Confidence score

Each field now includes the confidence score value as determined by the AI core engine. This score enables more automation, by not requiring human review of high-scored fields.

The `rir_confidence` field contains the score (in the range of 0 to 1), which corresponds to the estimated probability of correctness. For instance, 0.99 means that just 1 in 100 extractions of this field would be incorrect. 

Automation is controlled either by setting the `default_score_threshold` of a queue, `score_threshold` in a schema, or by custom business rules in an extension. 

Custom model automation

We released a new confidence score calibration model that enables better automation rates for custom model training users. 

Integration updates


API  upload with values

When setting initial datapoint values during document upload, “Upload values” can now be passed to the API along with the uploaded file. These values can be referred to from the schema rir_field_names attribute as part of a new “upload:” field namespace. For example, upload:organization_unit field can be referenced in a schema like this:

“category”: “datapoint”,
“id”: “organization_unit”,
“label”: “Org unit”,
type”: “string”,
“rir_field_names”: [“upload:organization_unit”]


This is useful when passing document metadata to the connector or export script, such as file origin. In addition, these values can be made visible in the schema and used, for example, to specify the default recipients during the upload process.

Elisctl upgrade

The new v2.4 edition of the elistcl tool contains a `elisctl document extract` command, which allows you to upload a document with pre-extracted data. This allows users to download only automatically captured data.

The Excel schema editor now supports the `can_export` attribute, allowing users to control which specific data points are included in the exported files.

Line items headers

Table rows can now be configured to differentiate between a “header” or “data” (body) type. This is useful for identifying columns that the AI engine does not support by default.

This can be set up within the newly introduced magic grid data structure (“grid“) schema configuration. Only “data” rows are extracted to the validation view footer as payload data when the magic grid is finalized. However, headers, if configured, are available as part of the grid structure, which is accessible within the annotation content API.

Messages for tables

You can add messages (warnings and errors) even to data points with children now, i.e. say a whole table (line items or VAT details).

User Experience Updates


Less scrolling

Some field values occur on multiple document pages (e.g. invoice number). Previously, in this case the engine would pick an occurrence based on minor differences in confidence score. Now, it will prioritise the values that appeared first in the document.

Line items

Magic grid is now easier to use:

  • When drawing a new magic grid, the app automatically guesses where to place the rows.
  • Any row in the magic grid can be immediately marked to be skipped during extraction,e.g. section headers.
  • You can easily jump between different magic grids in the document, which comes useful especially in long documents.

The table view has also been improved:

  • If you press the tick button while editing a datapoint in the footer, it moves to the next cell in the row.
  • Ctrl-A/Cmd-A no longer adds a row (use Ctrl-shift-A/Cmd-shift-A for that) – instead, the standard action of selecting all cell text is performed.

Field settings

An admin-level user can now enable (or disable) fields in the Settings screen, accessible from the main dashboard. This functionality is fully available only for newly created trial accounts.

Field settings in Rossum App

