Note: If you want to use a Dedicated AI Engine make sure you have purchased the feature before you start the training process.
Throughout the entire training period, you must follow the rules of the annotation process. Below are some basic rules that will help you increase the accuracy of your dedicated AI engine:
- Provide at least 500 documents. This is usually the minimum to achieve satisfactory accuracy. However, you may need more documents if you want to achieve higher accuracy, you only have a single training session, or you have documents with complicated layouts and several custom data fields. If the majority of your documents are in languages other than those that Rossum officially supports, at least 1,000 annotations need to be provided.
- Annotate at least 15 documents with the same layout. If there are layouts for which you are experiencing particularly low accuracy, provide more samples of these.
- Annotate a representative ratio of documents. If 50% of your production documents have layout A and 5% have layout B, the ratio should be kept approximately the same during the annotation.
- Only annotate data values, not the labels. For example, when annotating a PO number written as “PO. no.: AB1234″, only annotate “AB1234″.
- Always use Magic Grid when annotating line items. Do not add values to the line items by typing or by clicking on the blue bounding boxes.
- Only annotate data that is present in an invoice. A dedicated AI engine is always trained by an image. If a value is not present in a document, the engine cannot learn to extract it. For this reason, the engine cannot be trained to extract fields such as account codes. If you are adding values to some fields by typing them in manually without specifying a location, or adding them programmatically, such data will also not be used to train the dedicated AI engine.