Under the Hood: Rossum Product Improvement News
Every month, we share the newest and coolest features of Rossum’s document processing application, the underlying platform and the AI engine. Scroll down to see all the key features introduced in July 2019!
Each field now includes the confidence score value as determined by the AI core engine. This score enables more automation, by not requiring human review of high-scored fields.
The `rir_confidence` field contains the score (in the range of 0 to 1), which corresponds to the estimated probability of correctness. For instance, 0.99 means that just 1 in 100 extractions of this field would be incorrect.
Automation is controlled either by setting the `default_score_threshold` of a queue, `score_threshold` in a schema, or by custom business rules in an extension.
Custom model automation
We released a new confidence score calibration model that enables better automation rates for custom model training users.
API upload with values
When setting initial datapoint values during document upload, “Upload values” can now be passed to the API along with the uploaded file. These values can be referred to from the schema rir_field_names attribute as part of a new “upload:” field namespace. For example, upload:organization_unit field can be referenced in a schema like this:
“label”: “Org unit”,
This is useful when passing document metadata to the connector or export script, such as file origin. In addition, these values can be made visible in the schema and used, for example, to specify the default recipients during the upload process.
The new v2.4 edition of the elistcl tool contains a `elisctl document extract` command, which allows you to upload a document with pre-extracted data. This allows users to download only automatically captured data.
The Excel schema editor now supports the `can_export` attribute, allowing users to control which specific data points are included in the exported files.
Line items headers
Table rows can now be configured to differentiate between a “header” or “data” (body) type. This is useful for identifying columns that the AI engine does not support by default.
This can be set up within the newly introduced magic grid data structure (“grid“) schema configuration. Only “data” rows are extracted to the validation view footer as payload data when the magic grid is finalized. However, headers, if configured, are available as part of the grid structure, which is accessible within the annotation content API.
Messages for tables
You can add messages (warnings and errors) even to data points with children now, i.e. say a whole table (line items or VAT details).
Some field values occur on multiple document pages (e.g. invoice number). Previously, in this case the engine would pick an occurrence based on minor differences in confidence score. Now, it will prioritise the values that appeared first in the document.
Magic grid is now easier to use:
- When drawing a new magic grid, the app automatically guesses where to place the rows.
- Any row in the magic grid can be immediately marked to be skipped during extraction,e.g. section headers.
- You can easily jump between different magic grids in the document, which comes useful especially in long documents.
The table view has also been improved:
- If you press the tick button while editing a datapoint in the footer, it moves to the next cell in the row.
- Ctrl-A/Cmd-A no longer adds a row (use Ctrl-shift-A/Cmd-shift-A for that) – instead, the standard action of selecting all cell text is performed.
An admin-level user can now enable (or disable) fields in the Settings screen, accessible from the main dashboard. This functionality is fully available only for newly created trial accounts.
We would love to hear feedback on the updates introduced in July. Send your comments to firstname.lastname@example.org.
Try it for yourself. Sign up for a trial!