Skip to content

Inbound Email Processing

This document describes how incoming emails are received, validated, and routed to application-specific processors.

End-to-end architecture

flowchart TD
    Sender["External sender"] -->|sends email| SES["AWS SES<br/>(Receipt Rule)"]
    SES -->|store raw email| S3["S3 Bucket"]
    SES -->|publish notification| SNS["SNS Topic"]
    SNS -->|deliver| SQS["SQS Queue"]
    SQS -->|poll| Worker["SQS Worker"]
    Worker -->|can_handle?| SesEmailProcessor
    SesEmailProcessor -->|enqueue| RQ["RQ Queue"]
    RQ -->|execute| ProcessSES["process_ses_email()"]
    ProcessSES -->|validate| Validate{"Authenticity<br/>Safety<br/>Credibility"}
    Validate -->|fail| Drop["Drop email"]
    Validate -->|pass| Registry["InboundEmailProcessorsRegistry"]
    Registry -->|lookup recipient| AppProcessor["App-specific<br/>InboundEmailProcessor"]
    AppProcessor -->|fetch from| S3
Hold "Alt" / "Option" to enable pan & zoom

Step by step

Step Component What happens
1 AWS SES Receives the email via a configured Receipt Rule
2 SES Receipt Rule Stores the raw email (RFC 822) in an S3 bucket and publishes a notification to an SNS topic
3 SNS Delivers the notification as a JSON message to an SQS queue
4 SQS Worker Long-polls the queue, parses the message, and dispatches to SesEmailProcessor
5 SesEmailProcessor Unwraps the SNS envelope and enqueues process_ses_email() into RQ
6 process_ses_email() Validates the email, fetches it from S3, extracts attachments, and routes to the registered processor

SNS message format

SES wraps the email metadata inside an SNS notification. The SQS message body looks like:

{
  "Type": "Notification",
  "Subject": "Amazon SES Email Receipt Notification",
  "Message": "{\"mail\": {...}, \"receipt\": {...}}"
}

SesEmailProcessor.can_handle() matches on Type and Subject. The inner Message is JSON-decoded and passed to process_ses_email().

Key fields in the inner message

  • receipt.recipients — list of recipient addresses
  • receipt.action.bucketName / receipt.action.objectKey — S3 location of the raw email
  • mail.source — envelope sender
  • mail.headers — full email headers (used to extract X-Original-Sender)
  • mail.commonHeaders.subject — email subject
  • receipt.*Verdict — SES validation results (DKIM, SPF, DMARC, spam, virus)

Email validation

Emails go through three validation steps before processing. Authenticity and safety are hard blocks (email is dropped on failure). Credibility is a soft warning.

flowchart LR
    Email["Incoming email"] --> Auth{"Authenticity<br/>DKIM or DMARC pass?"}
    Auth -->|no| Drop1["Drop + log error"]
    Auth -->|yes| Safety{"Safety<br/>virus check pass?"}
    Safety -->|no| Drop2["Drop + log error"]
    Safety -->|yes| Cred{"Credibility<br/>spam check pass?"}
    Cred -->|no| Warn["Log warning<br/>(continue processing)"]
    Cred -->|yes| Process["Route to processor"]
    Warn --> Process
Hold "Alt" / "Option" to enable pan & zoom

Authenticity

Checks dkimVerdict and dmarcVerdict from the SES receipt. Either DKIM or DMARC must pass.

Note: SPF is intentionally not checked because emails rerouted from Gmail use ARC (Authenticated Received Chain) instead of SPF. A future improvement could validate ARC headers.

Safety

Checks virusVerdict. Blocks the email if the virus scan did not pass.

Credibility

Checks spamVerdict. Logs a warning if the email is classified as spam but does not block processing.

InboundEmailProcessor interface

Application-specific processors implement this ABC:

class InboundEmailProcessor(ABC):
    @staticmethod
    @abstractmethod
    def process(
            sender: str,
            subject: str,
            body: str,
            html: str,
            attachments: list[BytesIO],
            headers: list[dict[str, str]],
    ): ...
Argument Description
sender X-Original-Sender header, falling back to mail.source
subject Email subject from commonHeaders
body Plain text body (or None if absent)
html HTML body (or None if absent)
attachments List of BytesIO objects with .name attribute set
headers Raw email headers as [{"name": ..., "value": ...}] dicts

InboundEmailProcessorsRegistry

Flask extension that maps recipient addresses to processors.

Registration

Adding a new recipient address requires changes in two places:

  1. Terraform — add the address to the email_subscriptions.emails list of the relevant alan-application-backend module instance (in infra/src/stacks/). This configures the SES receipt rule to accept emails for that address.

    # Example: infra/src/stacks/main/qovery-env-backend-fr--prod/environment.tf
    email_subscriptions = {
      emails                    = ["noreply@processor.alan.eu"]
      ses_receipt_rule_set_name = var.ses_receipt_rule_set_name
      bucket_name               = var.emails_bucket_name
    }
    
  2. Python — register a processor for the address in the relevant app's __init__.py:

    registry = InboundEmailProcessorsRegistry(app)
    registry.register_email_processor(
        "noreply@processor.alan.eu",
        NoreplyInboundEmailProcessor,
    )
    

Note: If the address is not in email_subscriptions.emails, SES will never route the email to the SQS queue, so the Python processor will never be called.

Lookup

When process_ses_email() runs, it calls registry.get_first_processor_for(recipients) which returns the first processor matching any of the email's recipients. If no processor matches, the email is skipped.

Known registrations

Recipient Processor App Purpose
noreply@processor.alan.eu NoreplyInboundEmailProcessor fr_api Sends auto-reply to sender
accounting@processor.alan.eu BillingInboundEmailProcessor eu_tools Parses billing emails, uploads bills to S3, creates OCR jobs

Attachment extraction

_get_attachments() recursively walks the MIME tree and extracts all parts with Content-Disposition: attachment:

  • Multipart containers — recursed into
  • message/rfc822 (nested emails) — the .eml file itself is added, then the nested message is recursed for its own attachments
  • Text attachments — encoded to UTF-8 bytes
  • Binary attachments (PDF, images, audio, video) — decoded from base64/quoted-printable
  • Unnamed attachments — given a default filename based on MIME type (e.g. attachment.pdf, attachment.txt)

CLI: reprocessing emails

The registry exposes a Flask CLI command for manually reprocessing an email from S3:

direnv exec backend env APP=fr_api flask email reprocess <object_key> \
    --recipient noreply@processor.alan.eu \
    --bucket <bucket-name>

This bypasses SQS/SNS entirely: it fetches the raw email from S3, re-validates it, and runs the matched processor directly (synchronously, not through RQ).

The reprocessing path reconstructs SES receipt verdicts from the email's own headers (Authentication-Results, X-SES-Spam-Verdict, X-SES-Virus-Verdict).