Customizing Rossum: The Schema Configuration Guide
The main challenge we all run into when processing invoices is diversity – no matter that it is usually welcomed in our lives for bringing a refreshing vibe, it is quite a hassle in data capture. Be it varying layouts, multiple channels used for sending to your company or different requirements on what data you need to have in your company’s accounting system. All such needs should be accommodated. At Rossum, we are fully aware of this challenge and support the configuration of your Rossum account to fit your needs. And what’s more – you can be in full control of it yourself.
Change process in three steps
When you create a Rossum trial account, it comes with a default set of settings. You will start with one admin user, one workspace and one queue. In case you didn’t go through our configuration tutorial, let us briefly explain the basic terms before we dive deeper into the topic:
Organization – a basic unit that contains all the objects described below.
Schema – a set of data points that are extracted from the documents. (These are the fields you can see in your Rossum sidebar.) One queue always has one schema.
Queue – a basic organization unit. When you send a document to Rossum app you have to send it to a specific queue. For each queue you can setup a list of data fields that you want to extract. The default schema comes with one queue for received invoices. For Rossum users, one queue often represents a container for one type of documents.
User – an individual user of the app. Every user is assigned to an organization. The user who created your Rossum account is set as the admin user by default and can access document annotations from all queues, assign users to queues, and many other rights.
Workspace – queues may be grouped into workspaces. Workspaces are usually used as one accounting unit.
To configure these aspects of your Rossum account, you can look at the web interface (we are busy adding all the remaining bits of settings there), but for any of these items, you can also use the rossum command line tool. Specific tutorials may be found in our collection of developer guides. (And of course, true hackers can also use our API directly.)
In this article, we will focus on one of the most frequent adjustments –changing the configuration of fields to be captured from documents, the Rossum schema.
Changing the schema via UI
Edit (July 2019): We added a Settings button to the new Rossum accounts. The button can be found in the sidebar under the Report a problem button. Here you will find a list of all available fields with pre-checked options for those that are already included in your rossum. Set up your schema by choosing the right options and save your choice. The change will affect all the invoices in the queue, including the exported ones.
If you would like to add some extra fields, you can do so using Rossum.
Changing the schema using rossum
If you would like to follow this manual using some random data instead of those of yours, feel free to use any of the following samples:
- Sample schema in JSON
- Sample schema in XLSX
- File with account codes for salaries
- Schema with account codes for salaries in JSON
- Schema with account codes for salaries in XLSX
rossum can either be used in a command line interface mode by executing each command individually by passing it as an argument, or in an interactive shell mode of executing rossum without parameters and then typing the commands into the shown prompt. In this manual, we will be using rossum in an interactive shell mode.
The tool is available for Mac OS, Linux or Windows. For the Windows version, download the latest release, install it and run it either from the Start menu or within the Command Prompt application. For Linux and Mac OS, install the package from PyPI by typing the following to the Terminal:
pip install rossum
After the installation, run rossum from the menu panel, type “configure”, log in using the credentials to your Rossum account (the password will not be visible on screen while you type it in) and hit the Enter key when you are done:
You can have a look at the available commands easily by typing “help”:
rossum> help Documented commands (type help <topic>): ======================================== configure csv queue schema tools user workspace Undocumented commands: ====================== exit help quit
In the further stages of the process, you can type –help or -h to get the list of available commands.
Basic Schema Adjustments
(For a quick overview on the schema adjustments process, you can skip to the video.)
To adjust your schema, list the available queues first by typing queue list to find out the ID of your schema and download it. The default format of the output file with the schema is JSON and can be downloaded without explicitly stating the file format, typing the command as schema get [schema id] -O [path to the output file]:
rossum> queue list id name workspace inbox schema users ----- ------------- --------- ------- ------------ 10376 Received invoices 9510 firstname.lastname@example.org 51333 3564 rossum> schema get 51333 -O demo_schema.json
Alternatively, you can download the schema as an XLSX file which comes in handy while doing more complicated adjustments in the schemas. In this case, you must explicitly state the output format:
rossum> schema get 51333 --format xlsx -O demo_schema.xlsx
Such an XLSX file could look like this:
Open the file and configure it to your liking, following the structure of already existing fields. When adding new fields, you can either base them on automatically extracted field types or you can create custom fields that you would be labeling manually in the Rossum interface. The field value that influences the extracted values in your Rossum account can be found in the column “rir_field_names (json)”. If you want to use any of our pre-trained models for extraction, pick one from these lists and add its name in this same column to your new field. If not, the value of the field should be set to “”. You can also specify multiple field types as a fallback option:
You can change the value in the “Data” column (or under the key “id” in JSON) as you wish, but these names must be unique throughout the schema. Note that if you change an existing “Data” (“id”) value, this will lead to deletion of the data you entered manually in the UI during the review process on invoices that haven’t been exported yet. Once you have finished your adjustments, upload the modified file version back to Rossum, in JSON or XLSX format, by typing the command as schema update [schema id] [path to the file to be uploaded]:
rossum> schema update 51333 demo_schema.xlsx
After updating a schema, its ID number will change, so if you want to do other changes, start with “queue list” command to get an ID number of your new schema.
rossum> queue list id name workspace inbox schema users ----- ------------- --------- -------------- ------------ 10376 Received invoices 9510 email@example.com 62426 3564
To see the whole process of updating a schema in practice, watch the video below. For more information on schemas, its content, way of updating and similar, please refer to our API documentation.
Updating the schema
(For a quick overview on adding options to the schema, you can skip to the video.)
Besides changes to the regular data points, it is possible to add drop-down selectboxes containing a list of many options to rossum. For such purposes, there is an “enum” data type. It becomes particularly handy while adding, for example, GL codes or document types:
It is fairly easy to add account codes using the schema output in an XLSX file. You add a new “enum” data type field to the main sheet called Schema and create a new sheet called “Options of ” + the id of the respective enum field:
After you are finished, update the schema in your Rossum account.
You can also add options using JSON files, especially if you want to automate such processes. First, transform the file with the options you want to add from XLSX to CSV format. Keep in mind that your output CSV format should not contain any header. After that, transform the CSV file to options stored in TXT or JSON file.
rossum> tools xls_to_csv ~/Downloads/staff_salariesAC.xlsx --header 0 --sheet 0 -O ~/Downloads/staff_salariesAC.csv rossum> tools csv_to_options ~/Downloads/staff_salariesAC.csv -O ~/Downloads/options_to_be_uploaded.json
Next, add the new options to the original JSON schema file. The original schema file must have an existing enum data type, so that you can add new options to it. In our example, let’s add new options to the enum data point called staff_salaries:
rossum> schema transform -O ~/Downloads/final_schema_with_options.json substitute-options ~/Downloads/my_schema.json staff_salaries ~/Downloads/options_to_be_uploaded.json
To see the output, check schema_with_options.json file. After that, you can upload the new schema to your Rossum.
rossum> schema update 51333 my_schema.json
To see the whole process of adding options using JSON files, watch the video below.
Add account codes to the schema
To make things even easier, we added scripting support for many common schema operations, which can be easily used for schema management automation. If you run the commands in the command line interface instead of in the rossum shell directly, you can add the whole list of options to your schema in one go using bash:
rossum schema transform substitute-options ~/Downloads/my_schema.json staff_salaries <( rossum tools xls_to_csv ~/Downloads/staff_salariesAC.xlsx --header 0 --sheet 1 | rossum tools csv_to_options - )> staff_salaries_schema.json
(Note that Windows users may need to use the caret symbol (^) instead of the backslash to continue the command on the next line.)