Parsing¶
The document parsing service consists of parsing structured output from documents.
- The documents can be parsed automatically or manually.
- Automatically: Using a configurable automatic parsing flow
- Manually: Using a configurable manual parsing tool
- The document parsing configuration is defined for each document type (think FrInsuranceDocument, BeInsuranceDocument, etc.)
- This configuration provides:
- The document categories (ex: invoice, payslip, etc.) and their corresponding structured output (ex: InvoiceContent, PayslipContent, etc.)
- The automatic parsing flow configuration to define how a document should be transcribed, classified and structured information extracted.
- The manual parsing tool and how manual task should be assigned to the operators
How it works¶
flowchart LR
A(Start) --> B(Upload document)
B --> M{Has parsing configuration?}
M -- Yes --> C{Has auto-parsing configuration?}
C -- Yes --> D(Transcription)
D --> E{Valid Transcription?}
E -- Yes --> F(Classification)
F --> G{Valid Classification?}
G -- Yes --> H(Extraction)
H --> I{Valid Extraction?}
I -- Yes --> J(publish validated parsing event)
I -- No --> K(Open manual parsing task)
J --> L(End)
K --> L
G -- No --> K
E -- No --> K
C -- No --> K
M -- No --> L
Hold "Alt" / "Option" to enable pan & zoom
Document auto-parsing steps¶
The document auto-parsing flow is composed of 3 steps:
- Transcription: Transform a document to transcribed text.
- Classification: Classify the document by
categoryand additional classes needed for the extraction. - Extraction: Extract structured information from the document.
Document parsing configuration¶
The document parsing configuration is configured using the DocumentParsingConfiguration for a given document type. More details in the configuration page.