Note: If you want to use the Rossum dedicated AI engine, make sure to purchase the feature before you start the training process.
After purchasing the dedicated AI engine, the training process will consist of the following steps:
- Set up your extraction schema. Dedicated AI engine training requires that a special schema be set up. Rossum Solution Engineers will assist you with this.
- Teach the engine how to extract specific data. Use Rossum during this pre-training stage to teach it what to extract.
- Have the engine trained. After you are finished with the pre-training stage, reach out to your technical contact at Rossum; they will manage the training of your dedicated AI engine.
- Keep using Rossum. During the engine training process, you can use Rossum to process documents as you would normally.
- Enjoy the improvements in data capture accuracy. After training is complete, Rossum will activate your dedicated AI engine for your account.
During your subscription period the dedicated AI engine can be periodically trained from your annotations to ensure continuous improvement over time. Read more about continuous engine updates.
How your data is used to train the dedicated AI engine
The total number of documents in a dataset is split into a training set (~80%) and a test set (~20%). The test set is used as a control of the trained engine. Those are documents that the newly trained dedicated engine has not seen before.
The reason for separating the dataset into two subsets is that, in general, it is bad practice in machine learning to use the same set of documents both for training and evaluation. This could lead to more optimistic numbers as you would be evaluating the engine’s performance on the same data that you have used for training.
The training/test set division is not exact. Documents are selected into training and evaluation splits randomly by an algorithm. If documents are added to the datasets later, the initial training and evaluation splits will come from the new training and assessment splits. This is important to get consistent results between two consecutive DE training periods.