Note: If you want to use a Dedicated AI Engine make sure you have purchased the feature before you start the training process.
Throughout the entire training period, you must follow the rules of the annotation process. Below are some basic rules that will help you increase the accuracy of your dedicated AI engine:
- Provide at least 500 documents. This is usually the minimum to achieve satisfactory accuracy. However, you may need more documents if you want to achieve higher accuracy, or you have documents with complicated layouts and many custom data fields. If the majority of your documents are in languages other than those that Rossum officially supports, at least 1,000 annotations need to be provided.
- Annotate at least 15 documents with the same layout for the initial training. If there are layouts for which you are experiencing particularly low accuracy, provide more samples of these. It is not needed to annotate more than 30 documents with the same layout.
- Annotate a representative ratio of documents. If 50% of your production documents have layout A and 5% have layout B, the ratio should be kept approximately the same during the annotation.
- Only annotate data values, not the labels. For example, when annotating a PO number written as “PO. no.: AB1234″, only annotate “AB1234″.
- Always predominantly use Magic Grid when annotating line items. If you have a document where data from some columns can be extracted using Magic Grid and some data are located in the free text, first use the Magic Grid to extract the data from the nicely formatted columns. Extract the rest of the data by point-and-click approach (i.e. by clicking on the blue bounding boxes surrounding the data).
- Only annotate data that is present in the document. A dedicated AI engine is always trained by an image. If a value is not present in a document, the engine cannot learn to extract it. For this reason, the engine cannot be trained to extract fields such as account codes. If you are adding values to some fields by typing them in manually without specifying a location, or adding them programmatically, such data will also not be used to train the dedicated AI engine.