Skip to content

Unified-timeline refactor: replacing diff-revert for overrides

Status

Not started. Picked up from PR #97431 review (Antoine's architectural pitch). This file captures the plan in enough detail to start implementation in a follow-up PR.

Context

ConstraintOverride application currently uses a diff-revert model (internal/engine/override_constraint.py):

  1. Handler runs on the full input as if no override exists.
  2. The loop post-processes: for periods inside the override's protected window, revert handler-introduced modifications back to the pre-handler state.

This has a few real downsides:

  • Override knowledge is implicit. The handler doesn't know overrides exist; the loop layer cleans up after it. Works, but the two-step is conceptually opaque.
  • Convergence bug for removal-style constraints. When a constraint would remove the overridden member, the handler removes β†’ diff-revert restores in the window β†’ next iteration the handler removes again. The constraint loop never converges (caps out at MAX_CONSTRAINT_ITERATIONS). Not currently exercised by tests but real.
  • minimum_duration extension leaks. Chain extension walks subsequent contract versions without knowing about overrides. The handler creates extension periods inside override windows; diff-revert removes them post-hoc. Correct outcome, but only by chance β€” relies on the handler's internal _extend_into_subsequent_versions consulting _find_constraint_for_version which itself doesn't see overrides.

Target architecture

Build a per-(member, constraint) effective applicability timeline upstream of the constraint loop. Hand that timeline to handlers as the authoritative answer to "is this constraint enforced for this member at this date." Handlers stop having anything to do with overrides.

The new abstraction

def compute_applicability_per_member(
    *,
    constraint: ModuleConstraint,
    constraint_base_vp: ValidityPeriod,
    override_timelines: OverrideTimelines | None,
    enrollment_group_members: Collection[MemberWithMembershipType],
) -> dict[MemberWithMembershipType, Timeline[bool]]:
    """Per member, a Timeline[bool] where True = constraint is enforced."""

The function already exists in spirit (it was sketched in the abandoned PR #97431 refactor attempt β€” returning list[ValidityPeriod] fragments per member). For the unified-timeline rewrite, return Timeline[bool] instead. The Timeline abstraction in shared.temporal.timeline_v3 gives us composition (AND, OR, walk forward, intersection) that the handlers will exploit.

Override-fan-out semantics (cross-member constraint vs per-member) live inside this function β€” same as today's compute_protected_windows.

Handler signature change

Replace

constraint_validity_period: ValidityPeriod
with
applicability_per_member: dict[MemberWithMembershipType, Timeline[bool]]

Every handler iterates members + queries the per-member timeline instead of using a single constraint_interval.

Loop change

override_constraint.py goes away. The loop becomes:

applicability = compute_applicability_per_member(
    constraint=module_constraint,
    constraint_base_vp=contract_version.validity_period,
    override_timelines=override_timelines,
    enrollment_group_members=enrollment_group_members,
)
constraint_ok, new_periods = module_constraint_handler(
    module_constraint,
    applicability_per_member=applicability,
    enrollment_periods_with_commitment_info=enrollment_periods_with_commitment_info,
    eligibility_timelines=eligibility_timelines,
    input_timelines=input_timelines,
    contract=contract,
    change_request_timelines=change_request_timelines,
    retroactivity_cutoff_date=retroactivity_cutoff_date,
)
# No diff-revert. No protected_windows. No post-processing.

The fundamental distinction: read scope vs write scope

The handler reasons about two independent kinds of scope:

Read scope (calculation input): Which periods does the handler look at when computing what's true? For most constraints this is the full enrollment set β€” you want the real picture. MaxMembershipType counts how many partners exist; the count includes Alice during Sep-Nov even if her override exempts her from being affected. The read is the world as it is.

Write scope (effect target): Which periods can the handler actually modify (stamp commitment-info on, remove, add coverage to)? Two sub-axes:

  1. Time-wise: the per-member applicability timeline. True at date D = constraint may write here for this member.
  2. Member-wise: members the constraint applies to at all (e.g. MaxMembershipType.membership_type). Already baked into existing handler logic.

Write scope = (member-wise filter) ∩ (applicability True at date D for that member).

Why this distinction matters. The current diff-revert design treats write scope and read scope as the same thing β€” it filters the handler's output post-hoc. The unified-timeline keeps them separate: the handler sees the full enrollment (read) but is constrained to act only inside the write scope. That separation is what lets MaxMembershipType count correctly during an override window while still respecting the exemption: it counts Alice (read), can't remove her (write).

The exemption-based semantic falls out naturally. When the count exceeds the max during an override window, the handler must remove the excess from the subset of members whose write scope includes this date. If only Bob and Charlie are write-scope-subject Sep-Nov, both get removed there even if the priority would have spared one. The operator's override on Alice forces the cost onto the non-exempted members β€” that's the product-correct meaning of "exempt Alice from this constraint." If a different product semantic is preferred (e.g. slot-based: "the override creates a free slot, max becomes 2"), it can be expressed as a different handler rule on top of the same read/write abstraction.

Per-handler changes

Eight handlers register against module_constraint_handler. Each needs a targeted edit. Most are mechanical (replace constraint_interval with a per-member timeline query). Two are meaningful:

minimum_duration (chain-style β€” meaningful change)

The handler currently walks contract.versions_timeline() to extend chains across versions, asking "is the same commitment_identifier defined in version V?" via _find_constraint_for_version. That's an applicability question with the wrong source.

Change: walk the per-member applicability timeline for chain extension. Extension stops when the timeline says "not enforced" β€” no need to know whether the gap is from a contract version dropping the constraint or from an override removing it.

Specifically: - _extend_into_subsequent_versions consults applicability_per_member[member] to decide where to stop. - _check_free_removal_for_version also takes overrides into account via the timeline. - The "extension hit the applicability boundary" case is treated as partial commitment OK (operator's explicit signal). No more remove-everything fallback.

Pure _find_constraint_for_version calls that read commitment terms (commitment_duration_in_months, free_removal_settings) stay as they are β€” overrides don't change terms.

MaxMembershipType (cross-counting β€” meaningful change)

Today: iterates enrollment_periods of the constrained membership_type, counts at each date, picks excess members to remove. Doesn't know about applicability.

Change β€” explicit read vs write separation:

  • Read (count timeline): sum across all enrolled members of the constrained membership_type, at every date. Applicability is irrelevant for the count β€” an exempted member is still enrolled and still counts.
  • Write scope per member: the per-member applicability timeline.
  • Violation segments: where count > max_allowed.
  • Per violation segment, pick excess members to remove from the subset whose write scope includes this date. If fewer write-scope members are available than excess requires, all of them get removed (the override-exempt members structurally absorb none of the cost).

Behavioral consequence: a per-member override on one member, on a constraint that would otherwise remove them, shifts the removal onto the non-exempted members in that window rather than just letting the exempted member skip the constraint. This is the principled "exemption-based" semantic and what fixes the latent diff-revert convergence bug.

If a future product change wants different semantics ("slot-based": the override raises the effective max_allowed by 1 in its window), it can be expressed as a different rule on top of the same read/write abstraction β€” only the violation-segment derivation changes.

Mechanical changes (six handlers)

These already use constraint_interval only to filter writes. Replace the single constraint_interval.intersection(period_interval) check with applicability_per_member[period.member].any_true_in(period_interval). Other logic unchanged.

  • at_most_one
  • cover_by_default
  • family_dependency
  • maximum_first_enrollment_age
  • mutually_required
  • primary_dependency

Convergence guarantee

With the unified-timeline: - The applicability is precomputed and stable across loop iterations. - The handler's decisions are a deterministic function of stable inputs. - Iteration 2 sees the result of iteration 1 with the same applicability β†’ the handler reaches the same conclusion β†’ no change β†’ converged.

The diff-revert ping-pong (handler removes β†’ revert restores β†’ handler removes) cannot happen because there's no post-handler restoration.

Implementation plan

Ships as a single PR. Order of work within the PR:

  1. Abstraction. Add compute_applicability_per_member returning dict[Member, Timeline[bool]]. Cross-member fan-out mirrors today's compute_protected_windows. Add helpers on Timeline[bool] for the queries the handlers need ("any True in interval", "first False after date", AND with another Timeline[bool]).

  2. Handler signature change. Replace constraint_validity_period: ValidityPeriod with applicability_per_member: dict[Member, Timeline[bool]] on module_constraint_handler and all 8 registered handlers.

  3. Mechanical handler rewrites (6 files). For at_most_one, cover_by_default, family_dependency, maximum_first_enrollment_age, mutually_required, primary_dependency: replace each constraint_interval write-filter check with applicability_per_member[period.member].any_true_in(...). Existing tests should pass unchanged.

  4. minimum_duration rewrite. Walk the per-member applicability timeline for chain extension. Extension stops at the first applicability boundary (does not skip the gap and resume β€” see "Chain extension semantic" below). Treat applicability-bounded extension as partial-commitment OK. _find_constraint_for_version keeps its commitment-terms lookups (duration, free-removal config); applicability lookups move to the timeline.

  5. MaxMembershipType rewrite. Read-vs-write split: count includes everyone (read), removals restricted to write-scope-subject members per violation segment.

  6. Loop change. apply_constraints_to_enrollment_periods calls compute_applicability_per_member once per (version, constraint) and passes the result to the handler. No diff-revert wrapper.

  7. Cleanup. Delete override_constraint.py, _precompute_protected_windows_by_constraint, _build_module_lookups (was only needed for the service_type/module_slug filter in diff-revert).

  8. Test updates. Update the "Override punches a gap" expectation (Jan-Aug only). Add the new scenarios listed in the test plan.

Chain extension semantic (minimum_duration)

When a chain's extension hits the applicability boundary (the override removes the constraint past this date), extension stops β€” it does not skip the gap and resume at the next True segment.

Concretely for the "Override punches a gap" scenario: - Alice enrolled Jan-Jun, 12-month commitment, override Sep-Nov - Applicability: Jan-Aug True, Sep-Nov False, Dec→ True - Extension walks from Jun-30 → reaches Aug-31 → hits Sep-1 boundary → stops - Chain is 8 months. Treated as partial-commitment OK. Stamp Jan-Aug. - Result: Alice has Jan-Aug. No Dec sliver. No fresh Dec-onwards re-commitment.

Rationale: the override is the operator's explicit "no enforcement past this point" signal. Forcing extension to continue past it would mean the engine creates coverage in the True-again region (Dec onwards) specifically to satisfy a commitment the operator paused. That contradicts the override's intent.

If a future product need wants the alternative ("count only enforced months toward commitment duration, resume after gap"), it'd be a deliberate semantic change captured as a constraint config flag, not the default.

Test plan

New scenarios to add

Convergence under per-member override on removal-style constraints (currently latent bug in diff-revert):

  1. MaxMembershipType(partner, max=1) with Alice + Bob both enrolled, override exempts Alice β†’ expect Alice kept, Bob removed, loop converges in 1 iteration.
  2. MaxMembershipType(partner, max=1) with Alice + Bob + Charlie all enrolled Jan-Dec, override exempts Alice during Sep-Nov β†’ exemption-based outcome:
  3. Alice: Jan-Dec kept (priority spares her where she's in write scope; she's not in write scope Sep-Nov so untouchable there)
  4. Bob: removed entirely (write scope all year; excess 2 in Jan-Aug/Dec drops him by priority; only candidate Sep-Nov alongside Charlie when excess is 2 β€” both must go)
  5. Charlie: removed entirely (same reasoning as Bob)
  6. Loop converges in 1 iteration.

This is the multi-member divergent-applicability case that diff-revert currently handles wrongly (or non-convergent for similar shapes). Pin the expected outcome explicitly in the scenario.

Per-handler override scenarios (most don't have any today):

  1. cover_by_default + override removing the constraint in a window β†’ expect no default fill inside the window.
  2. at_most_one + override + a member enrolled in multiple modules inside the window β†’ expect no removal in the window.
  3. family_dependency (cross-member) + override on the primary β†’ family-wide fan-out as today.
  4. maximum_first_enrollment_age + override + first enrollment past the limit inside the window β†’ expect enrollment kept in the window.
  5. mutually_required + override + a member missing the required module inside the window β†’ expect no removal inside the window.
  6. primary_dependency (cross-member) + override on dependent β†’ family-wide fan-out as today.

minimum_duration scenarios:

  1. "Override punches a gap" (existing scenario) β€” update expectation from Jan-Aug + Dec-Dec to Jan-Aug partial-commitment. The Dec sliver wart goes away; document the new semantic.
  2. Chain spans contract-version-boundary AND override window β€” extension must respect both. Probably extends only up to the applicability boundary, not to the next contract version.
  3. Override that REMOVES (not applies) over a chain β€” applicability timeline should treat removal as a no-op (no segment to subtract).

Filtered overrides (currently 0 test coverage):

  1. Override with non-empty service_types filter β†’ only periods of matching service_type are affected.
  2. Override with non-empty module_slugs filter β†’ only periods of matching module slug are affected.

Tests expected to change

If the rewrite is faithful, most of the ~120 constraint scenarios and 12 override unit tests should pass unchanged. Expected changes:

  • One scenario expectation update: "Override punches a gap through a minimum_duration extension" β€” output becomes Jan-Aug partial, no Dec sliver. Update the scenario's expected table and comments.
  • No other constraint scenarios expected to change. The mechanical handlers produce the same outputs because applicability collapses to the constraint's contract-version VP when there's no override.

Tests NOT expected to change

  • All 49 engine unit tests (internal/engine/tests/).
  • All 12 override painting unit tests (internal/engine/tests/test_overrides.py).
  • All 111 non-override constraint feature scenarios.
  • 10 of 11 override constraint scenarios.

Risk register

  • Filtered overrides (service_types, module_slugs). No current test exercises them. The new abstraction needs to encode the filter somewhere β€” either as a richer timeline value, or by carrying a per-period match check at the handler level. Decide the shape in step 1 of the implementation order.
  • Chain-bridging in minimum_duration. Today's iteration-2 bridging mechanic (where two chains separated by an override window merge into one logical commitment) is a side effect of diff-revert's splitting. The unified-timeline replaces it with "applicability boundary = partial commitment OK; no skip-and-resume." Different semantic; document in the scenario.
  • Single-PR scope. The change touches 8 handlers, the loop, and one new abstraction. Each handler change is small but multiplied by 8 it's a meaningful diff. Run the full constraint suite after each handler is migrated to catch regressions early.