Inbound Email Processing¶
This document describes how incoming emails are received, validated, and routed to application-specific processors.
End-to-end architecture¶
flowchart TD
Sender["External sender"] -->|sends email| SES["AWS SES<br/>(Receipt Rule)"]
SES -->|store raw email| S3["S3 Bucket"]
SES -->|publish notification| SNS["SNS Topic"]
SNS -->|deliver| SQS["SQS Queue"]
SQS -->|poll| Worker["SQS Worker"]
Worker -->|can_handle?| SesEmailProcessor
SesEmailProcessor -->|enqueue| RQ["RQ Queue"]
RQ -->|execute| ProcessSES["process_ses_email()"]
ProcessSES -->|validate| Validate{"Authenticity<br/>Safety<br/>Credibility"}
Validate -->|fail| Drop["Drop email"]
Validate -->|pass| Registry["InboundEmailProcessorsRegistry"]
Registry -->|lookup recipient| AppProcessor["App-specific<br/>InboundEmailProcessor"]
AppProcessor -->|fetch from| S3
Step by step¶
| Step | Component | What happens |
|---|---|---|
| 1 | AWS SES | Receives the email via a configured Receipt Rule |
| 2 | SES Receipt Rule | Stores the raw email (RFC 822) in an S3 bucket and publishes a notification to an SNS topic |
| 3 | SNS | Delivers the notification as a JSON message to an SQS queue |
| 4 | SQS Worker | Long-polls the queue, parses the message, and dispatches to SesEmailProcessor |
| 5 | SesEmailProcessor | Unwraps the SNS envelope and enqueues process_ses_email() into RQ |
| 6 | process_ses_email() | Validates the email, fetches it from S3, extracts attachments, and routes to the registered processor |
SNS message format¶
SES wraps the email metadata inside an SNS notification. The SQS message body looks like:
{
"Type": "Notification",
"Subject": "Amazon SES Email Receipt Notification",
"Message": "{\"mail\": {...}, \"receipt\": {...}}"
}
SesEmailProcessor.can_handle() matches on Type and Subject. The inner Message is JSON-decoded and passed to
process_ses_email().
Key fields in the inner message¶
receipt.recipients— list of recipient addressesreceipt.action.bucketName/receipt.action.objectKey— S3 location of the raw emailmail.source— envelope sendermail.headers— full email headers (used to extractX-Original-Sender)mail.commonHeaders.subject— email subjectreceipt.*Verdict— SES validation results (DKIM, SPF, DMARC, spam, virus)
Email validation¶
Emails go through three validation steps before processing. Authenticity and safety are hard blocks (email is dropped on failure). Credibility is a soft warning.
flowchart LR
Email["Incoming email"] --> Auth{"Authenticity<br/>DKIM or DMARC pass?"}
Auth -->|no| Drop1["Drop + log error"]
Auth -->|yes| Safety{"Safety<br/>virus check pass?"}
Safety -->|no| Drop2["Drop + log error"]
Safety -->|yes| Cred{"Credibility<br/>spam check pass?"}
Cred -->|no| Warn["Log warning<br/>(continue processing)"]
Cred -->|yes| Process["Route to processor"]
Warn --> Process
Authenticity¶
Checks dkimVerdict and dmarcVerdict from the SES receipt. Either DKIM or DMARC must pass.
Note: SPF is intentionally not checked because emails rerouted from Gmail use ARC (Authenticated Received Chain) instead of SPF. A future improvement could validate ARC headers.
Safety¶
Checks virusVerdict. Blocks the email if the virus scan did not pass.
Credibility¶
Checks spamVerdict. Logs a warning if the email is classified as spam but does not block processing.
InboundEmailProcessor interface¶
Application-specific processors implement this ABC:
class InboundEmailProcessor(ABC):
@staticmethod
@abstractmethod
def process(
sender: str,
subject: str,
body: str,
html: str,
attachments: list[BytesIO],
headers: list[dict[str, str]],
): ...
| Argument | Description |
|---|---|
sender |
X-Original-Sender header, falling back to mail.source |
subject |
Email subject from commonHeaders |
body |
Plain text body (or None if absent) |
html |
HTML body (or None if absent) |
attachments |
List of BytesIO objects with .name attribute set |
headers |
Raw email headers as [{"name": ..., "value": ...}] dicts |
InboundEmailProcessorsRegistry¶
Flask extension that maps recipient addresses to processors.
Registration¶
Adding a new recipient address requires changes in two places:
-
Terraform — add the address to the
email_subscriptions.emailslist of the relevantalan-application-backendmodule instance (ininfra/src/stacks/). This configures the SES receipt rule to accept emails for that address. -
Python — register a processor for the address in the relevant app's
__init__.py:
Note: If the address is not in
email_subscriptions.emails, SES will never route the email to the SQS queue, so the Python processor will never be called.
Lookup¶
When process_ses_email() runs, it calls registry.get_first_processor_for(recipients) which returns the first
processor matching any of the email's recipients. If no processor matches, the email is skipped.
Known registrations¶
| Recipient | Processor | App | Purpose |
|---|---|---|---|
noreply@processor.alan.eu |
NoreplyInboundEmailProcessor |
fr_api |
Sends auto-reply to sender |
accounting@processor.alan.eu |
BillingInboundEmailProcessor |
eu_tools |
Parses billing emails, uploads bills to S3, creates OCR jobs |
Attachment extraction¶
_get_attachments() recursively walks the MIME tree and extracts all parts with Content-Disposition: attachment:
- Multipart containers — recursed into
message/rfc822(nested emails) — the.emlfile itself is added, then the nested message is recursed for its own attachments- Text attachments — encoded to UTF-8 bytes
- Binary attachments (PDF, images, audio, video) — decoded from base64/quoted-printable
- Unnamed attachments — given a default filename based on MIME type (e.g.
attachment.pdf,attachment.txt)
CLI: reprocessing emails¶
The registry exposes a Flask CLI command for manually reprocessing an email from S3:
direnv exec backend env APP=fr_api flask email reprocess <object_key> \
--recipient noreply@processor.alan.eu \
--bucket <bucket-name>
This bypasses SQS/SNS entirely: it fetches the raw email from S3, re-validates it, and runs the matched processor directly (synchronously, not through RQ).
The reprocessing path reconstructs SES receipt verdicts from the email's own headers (Authentication-Results,
X-SES-Spam-Verdict, X-SES-Virus-Verdict).