Skip to content

Parsing

The document parsing service consists of parsing structured output from documents.

  • The documents can be parsed automatically or manually.
    • Automatically: Using a configurable automatic parsing flow
    • Manually: Using a configurable manual parsing tool
  • The document parsing configuration is defined for each document type (think FrInsuranceDocument, BeInsuranceDocument, etc.)
  • This configuration provides:
    • The document categories (ex: invoice, payslip, etc.) and their corresponding structured output (ex: InvoiceContent, PayslipContent, etc.)
    • The automatic parsing flow configuration to define how a document should be transcribed, classified and structured information extracted.
    • The manual parsing tool and how manual task should be assigned to the operators

How it works

flowchart LR
    A(Start) --> B(Upload document)
    B --> M{Has parsing configuration?}
    M -- Yes --> C{Has auto-parsing configuration?}
    C -- Yes --> D(Transcription)
    D --> E{Valid Transcription?}
    E -- Yes --> F(Classification)
    F --> G{Valid Classification?}
    G -- Yes --> H(Extraction)
    H --> I{Valid Extraction?}
    I -- Yes --> J(publish validated parsing event)
    I -- No --> K(Open manual parsing task)
    J --> L(End)
    K --> L
    G -- No --> K
    E -- No --> K
    C -- No --> K
    M -- No --> L
Hold "Alt" / "Option" to enable pan & zoom

Document auto-parsing steps

The document auto-parsing flow is composed of 3 steps:

  • Transcription: Transform a document to transcribed text.
  • Classification: Classify the document by category and additional classes needed for the extraction.
  • Extraction: Extract structured information from the document.

Document parsing configuration

The document parsing configuration is configured using the DocumentParsingConfiguration for a given document type. More details in the configuration page.