After capturing data within the Rossum app, the final step is getting data out of the system for further processing. There are several data formats Rossum supports – CSV, XML or JSON. While XML and JSON are important for programmers and integrators working on automatic integration in other workflows, anyone can work with CSV files, which are essentially a spreadsheet format.
Export files can be downloaded from the “Exported” tab in the app. Below, we will also learn a trick to download custom export files from a special web URL address.
Opening Rossum CSV files
CSV stands for “Comma-Separated Values” and can be opened in your favorite spreadsheet tools like Microsoft Excel or Google Sheets. Unfortunately, each tool interprets CSV files a little separately, therefore the most reliable approach to open the file is to:
- Explicitly “Import” the file rather than just using the Open file dialog.
- Check the detailed import settings and make sure that “,” (comma) is selected as the column separator rather than for example “;” (semicolon) or TAB.
- Confirm that character encoding “UTF-8” is selected (rather than e.g. ISO-8859-1).
The spreadsheet contains one row per line item (or a single row if no line items were captured). Header fields are repeated on each row, which is how you can group rows easily.
Exporting only new data from Rossum
When clicking the Download all button, data for all documents exported so far are downloaded. This suits experiments and proof-of-concept well. However, in case the CSV export file is used as the long-term workflow solution, you would prefer to generate files with non-overlapping data.
The easiest solution is to just export data for documents exported in a given date range. Then, you can e.g. download data only for documents exported in the last day. To achieve this, access a special web URL address instead of clicking the download button:
Copy-paste this URL to your browser’s address bar, and edit the pieces in red:
- Replace 1234 with the number of your queue. How to find it out? Look at the address of your web app while having the document list of the queue open, it will look like https://elis.rossum.ai/annotations/5678?… The number in red (5678 in this example) is the queue id, replace 1234 with that id.
- Replace 2019-06-19 with the desired start date, and 2019-06-20 with the end date. In this example, data for documents exported during Jun 19, 2019 (from midnight GMT, until midnight Jun 20) are downloaded. You can also include a time of day (in GMT), e.g. “exported_at_before=2019-06-19 18:00”.
A login prompt (from server api.elis.rossum.ai) will show up and after submitting your credentials, the export file should be on its way.