Skip to content

Classification

The classification of a document consists of determining the different classes of the document.

  • The category class is always required. It is used to determine the structure for the extraction phase.
  • Additional classes can be configured in this step to fine-tune the extraction configuration. For examples:
    • The language of the document
    • The document subcategory. In FR insurance documents, the category invoice has different subcategories dentist, pharmacy, hospital, etc.

How it works

  • The classification can be done either using a Sagemaker inference point or using an LLM to classify.
  • When the classifier fails, it uses a fallback values and send the document to manual parsing.

SageMaker classifier

  • The sageMaker classifier is used to classify a document using a SageMaker model that is deployed in a Sagemaker inference points.
  • The model can be trained using a Turing predictor ⧉.

LLM classifier

The LLM classifier is used to classify a document using an LLM model. It consists of providing to the LLM model the instruction to classify the document and the document transcription.