Classification¶
The classification of a document consists of determining the different classes of the document.
- The
categoryclass is always required. It is used to determine the structure for the extraction phase. - Additional classes can be configured in this step to fine-tune the extraction configuration. For examples:
- The language of the document
- The document subcategory. In FR insurance documents, the category
invoicehas different subcategoriesdentist,pharmacy,hospital, etc.
How it works¶
- The classification can be done either using a
Sagemakerinference point or using anLLMto classify. - When the classifier fails, it uses a fallback values and send the document to manual parsing.
SageMaker classifier¶
- The sageMaker classifier is used to classify a document using a SageMaker model that is deployed in a Sagemaker inference points.
- The model can be trained using a Turing predictor ⧉.
LLM classifier¶
The LLM classifier is used to classify a document using an LLM model. It consists of providing to the LLM model the instruction to classify the document and the document transcription.