Classification¶

The classification of a document consists of determining the different classes of the document.

The category class is always required. It is used to determine the structure for the extraction phase.
Additional classes can be configured in this step to fine-tune the extraction configuration. For examples:
- The language of the document
- The document subcategory. In FR insurance documents, the category invoice has different subcategories dentist, pharmacy, hospital, etc.

How it works¶

The classification can be done either using a Sagemaker inference point or using an LLM to classify.
When the classifier fails, it uses a fallback values and send the document to manual parsing.

SageMaker classifier¶

The sageMaker classifier is used to classify a document using a SageMaker model that is deployed in a Sagemaker inference points.
The model can be trained using a Turing predictor ⧉.

LLM classifier¶

The LLM classifier is used to classify a document using an LLM model. It consists of providing to the LLM model the instruction to classify the document and the document transcription.