Unified-timeline refactor: replacing diff-revert for overrides¶
Status¶
Not started. Picked up from PR #97431 review (Antoine's architectural pitch). This file captures the plan in enough detail to start implementation in a follow-up PR.
Context¶
ConstraintOverride application currently uses a diff-revert model
(internal/engine/override_constraint.py):
- Handler runs on the full input as if no override exists.
- The loop post-processes: for periods inside the override's protected window, revert handler-introduced modifications back to the pre-handler state.
This has a few real downsides:
- Override knowledge is implicit. The handler doesn't know overrides exist; the loop layer cleans up after it. Works, but the two-step is conceptually opaque.
- Convergence bug for removal-style constraints. When a constraint
would remove the overridden member, the handler removes β diff-revert
restores in the window β next iteration the handler removes again. The
constraint loop never converges (caps out at
MAX_CONSTRAINT_ITERATIONS). Not currently exercised by tests but real. minimum_durationextension leaks. Chain extension walks subsequent contract versions without knowing about overrides. The handler creates extension periods inside override windows; diff-revert removes them post-hoc. Correct outcome, but only by chance β relies on the handler's internal_extend_into_subsequent_versionsconsulting_find_constraint_for_versionwhich itself doesn't see overrides.
Target architecture¶
Build a per-(member, constraint) effective applicability timeline upstream of the constraint loop. Hand that timeline to handlers as the authoritative answer to "is this constraint enforced for this member at this date." Handlers stop having anything to do with overrides.
The new abstraction¶
def compute_applicability_per_member(
*,
constraint: ModuleConstraint,
constraint_base_vp: ValidityPeriod,
override_timelines: OverrideTimelines | None,
enrollment_group_members: Collection[MemberWithMembershipType],
) -> dict[MemberWithMembershipType, Timeline[bool]]:
"""Per member, a Timeline[bool] where True = constraint is enforced."""
The function already exists in spirit (it was sketched in the abandoned
PR #97431 refactor attempt β returning list[ValidityPeriod] fragments
per member). For the unified-timeline rewrite, return Timeline[bool]
instead. The Timeline abstraction in shared.temporal.timeline_v3
gives us composition (AND, OR, walk forward, intersection) that the
handlers will exploit.
Override-fan-out semantics (cross-member constraint vs per-member) live
inside this function β same as today's compute_protected_windows.
Handler signature change¶
Replace
withEvery handler iterates members + queries the per-member timeline instead
of using a single constraint_interval.
Loop change¶
override_constraint.py goes away. The loop becomes:
applicability = compute_applicability_per_member(
constraint=module_constraint,
constraint_base_vp=contract_version.validity_period,
override_timelines=override_timelines,
enrollment_group_members=enrollment_group_members,
)
constraint_ok, new_periods = module_constraint_handler(
module_constraint,
applicability_per_member=applicability,
enrollment_periods_with_commitment_info=enrollment_periods_with_commitment_info,
eligibility_timelines=eligibility_timelines,
input_timelines=input_timelines,
contract=contract,
change_request_timelines=change_request_timelines,
retroactivity_cutoff_date=retroactivity_cutoff_date,
)
# No diff-revert. No protected_windows. No post-processing.
The fundamental distinction: read scope vs write scope¶
The handler reasons about two independent kinds of scope:
Read scope (calculation input):
Which periods does the handler look at when computing what's true? For
most constraints this is the full enrollment set β you want the real
picture. MaxMembershipType counts how many partners exist; the count
includes Alice during Sep-Nov even if her override exempts her from
being affected. The read is the world as it is.
Write scope (effect target): Which periods can the handler actually modify (stamp commitment-info on, remove, add coverage to)? Two sub-axes:
- Time-wise: the per-member applicability timeline. True at date D = constraint may write here for this member.
- Member-wise: members the constraint applies to at all (e.g.
MaxMembershipType.membership_type). Already baked into existing handler logic.
Write scope = (member-wise filter) β© (applicability True at date D for that member).
Why this distinction matters. The current diff-revert design treats
write scope and read scope as the same thing β it filters the handler's
output post-hoc. The unified-timeline keeps them separate: the handler
sees the full enrollment (read) but is constrained to act only inside
the write scope. That separation is what lets MaxMembershipType count
correctly during an override window while still respecting the
exemption: it counts Alice (read), can't remove her (write).
The exemption-based semantic falls out naturally. When the count exceeds the max during an override window, the handler must remove the excess from the subset of members whose write scope includes this date. If only Bob and Charlie are write-scope-subject Sep-Nov, both get removed there even if the priority would have spared one. The operator's override on Alice forces the cost onto the non-exempted members β that's the product-correct meaning of "exempt Alice from this constraint." If a different product semantic is preferred (e.g. slot-based: "the override creates a free slot, max becomes 2"), it can be expressed as a different handler rule on top of the same read/write abstraction.
Per-handler changes¶
Eight handlers register against module_constraint_handler. Each needs a
targeted edit. Most are mechanical (replace constraint_interval with a
per-member timeline query). Two are meaningful:
minimum_duration (chain-style β meaningful change)¶
The handler currently walks contract.versions_timeline() to extend
chains across versions, asking "is the same commitment_identifier
defined in version V?" via _find_constraint_for_version. That's an
applicability question with the wrong source.
Change: walk the per-member applicability timeline for chain extension. Extension stops when the timeline says "not enforced" β no need to know whether the gap is from a contract version dropping the constraint or from an override removing it.
Specifically:
- _extend_into_subsequent_versions consults
applicability_per_member[member] to decide where to stop.
- _check_free_removal_for_version also takes overrides into account
via the timeline.
- The "extension hit the applicability boundary" case is treated as
partial commitment OK (operator's explicit signal). No more
remove-everything fallback.
Pure _find_constraint_for_version calls that read commitment
terms (commitment_duration_in_months, free_removal_settings) stay
as they are β overrides don't change terms.
MaxMembershipType (cross-counting β meaningful change)¶
Today: iterates enrollment_periods of the constrained membership_type,
counts at each date, picks excess members to remove. Doesn't know about
applicability.
Change β explicit read vs write separation:
- Read (
counttimeline): sum across all enrolled members of the constrainedmembership_type, at every date. Applicability is irrelevant for the count β an exempted member is still enrolled and still counts. - Write scope per member: the per-member applicability timeline.
- Violation segments: where
count > max_allowed. - Per violation segment, pick
excessmembers to remove from the subset whose write scope includes this date. If fewer write-scope members are available thanexcessrequires, all of them get removed (the override-exempt members structurally absorb none of the cost).
Behavioral consequence: a per-member override on one member, on a constraint that would otherwise remove them, shifts the removal onto the non-exempted members in that window rather than just letting the exempted member skip the constraint. This is the principled "exemption-based" semantic and what fixes the latent diff-revert convergence bug.
If a future product change wants different semantics ("slot-based": the
override raises the effective max_allowed by 1 in its window), it can
be expressed as a different rule on top of the same read/write
abstraction β only the violation-segment derivation changes.
Mechanical changes (six handlers)¶
These already use constraint_interval only to filter writes. Replace
the single constraint_interval.intersection(period_interval) check with
applicability_per_member[period.member].any_true_in(period_interval).
Other logic unchanged.
at_most_onecover_by_defaultfamily_dependencymaximum_first_enrollment_agemutually_requiredprimary_dependency
Convergence guarantee¶
With the unified-timeline: - The applicability is precomputed and stable across loop iterations. - The handler's decisions are a deterministic function of stable inputs. - Iteration 2 sees the result of iteration 1 with the same applicability β the handler reaches the same conclusion β no change β converged.
The diff-revert ping-pong (handler removes β revert restores β handler removes) cannot happen because there's no post-handler restoration.
Implementation plan¶
Ships as a single PR. Order of work within the PR:
-
Abstraction. Add
compute_applicability_per_memberreturningdict[Member, Timeline[bool]]. Cross-member fan-out mirrors today'scompute_protected_windows. Add helpers onTimeline[bool]for the queries the handlers need ("any True in interval", "first False after date", AND with anotherTimeline[bool]). -
Handler signature change. Replace
constraint_validity_period: ValidityPeriodwithapplicability_per_member: dict[Member, Timeline[bool]]onmodule_constraint_handlerand all 8 registered handlers. -
Mechanical handler rewrites (6 files). For
at_most_one,cover_by_default,family_dependency,maximum_first_enrollment_age,mutually_required,primary_dependency: replace eachconstraint_intervalwrite-filter check withapplicability_per_member[period.member].any_true_in(...). Existing tests should pass unchanged. -
minimum_durationrewrite. Walk the per-member applicability timeline for chain extension. Extension stops at the first applicability boundary (does not skip the gap and resume β see "Chain extension semantic" below). Treat applicability-bounded extension as partial-commitment OK._find_constraint_for_versionkeeps its commitment-terms lookups (duration, free-removal config); applicability lookups move to the timeline. -
MaxMembershipTyperewrite. Read-vs-write split: count includes everyone (read), removals restricted to write-scope-subject members per violation segment. -
Loop change.
apply_constraints_to_enrollment_periodscallscompute_applicability_per_memberonce per(version, constraint)and passes the result to the handler. No diff-revert wrapper. -
Cleanup. Delete
override_constraint.py,_precompute_protected_windows_by_constraint,_build_module_lookups(was only needed for the service_type/module_slug filter in diff-revert). -
Test updates. Update the "Override punches a gap" expectation (Jan-Aug only). Add the new scenarios listed in the test plan.
Chain extension semantic (minimum_duration)¶
When a chain's extension hits the applicability boundary (the override removes the constraint past this date), extension stops β it does not skip the gap and resume at the next True segment.
Concretely for the "Override punches a gap" scenario: - Alice enrolled Jan-Jun, 12-month commitment, override Sep-Nov - Applicability: Jan-Aug True, Sep-Nov False, Decβ True - Extension walks from Jun-30 β reaches Aug-31 β hits Sep-1 boundary β stops - Chain is 8 months. Treated as partial-commitment OK. Stamp Jan-Aug. - Result: Alice has Jan-Aug. No Dec sliver. No fresh Dec-onwards re-commitment.
Rationale: the override is the operator's explicit "no enforcement past this point" signal. Forcing extension to continue past it would mean the engine creates coverage in the True-again region (Dec onwards) specifically to satisfy a commitment the operator paused. That contradicts the override's intent.
If a future product need wants the alternative ("count only enforced months toward commitment duration, resume after gap"), it'd be a deliberate semantic change captured as a constraint config flag, not the default.
Test plan¶
New scenarios to add¶
Convergence under per-member override on removal-style constraints (currently latent bug in diff-revert):
MaxMembershipType(partner, max=1)with Alice + Bob both enrolled, override exempts Alice β expect Alice kept, Bob removed, loop converges in 1 iteration.MaxMembershipType(partner, max=1)with Alice + Bob + Charlie all enrolled Jan-Dec, override exempts Alice during Sep-Nov β exemption-based outcome:- Alice: Jan-Dec kept (priority spares her where she's in write scope; she's not in write scope Sep-Nov so untouchable there)
- Bob: removed entirely (write scope all year; excess 2 in Jan-Aug/Dec drops him by priority; only candidate Sep-Nov alongside Charlie when excess is 2 β both must go)
- Charlie: removed entirely (same reasoning as Bob)
- Loop converges in 1 iteration.
This is the multi-member divergent-applicability case that diff-revert currently handles wrongly (or non-convergent for similar shapes). Pin the expected outcome explicitly in the scenario.
Per-handler override scenarios (most don't have any today):
cover_by_default+ override removing the constraint in a window β expect no default fill inside the window.at_most_one+ override + a member enrolled in multiple modules inside the window β expect no removal in the window.family_dependency(cross-member) + override on the primary β family-wide fan-out as today.maximum_first_enrollment_age+ override + first enrollment past the limit inside the window β expect enrollment kept in the window.mutually_required+ override + a member missing the required module inside the window β expect no removal inside the window.primary_dependency(cross-member) + override on dependent β family-wide fan-out as today.
minimum_duration scenarios:
- "Override punches a gap" (existing scenario) β update expectation
from
Jan-Aug + Dec-DectoJan-Augpartial-commitment. The Dec sliver wart goes away; document the new semantic. - Chain spans contract-version-boundary AND override window β extension must respect both. Probably extends only up to the applicability boundary, not to the next contract version.
- Override that REMOVES (not applies) over a chain β applicability timeline should treat removal as a no-op (no segment to subtract).
Filtered overrides (currently 0 test coverage):
- Override with non-empty
service_typesfilter β only periods of matching service_type are affected. - Override with non-empty
module_slugsfilter β only periods of matching module slug are affected.
Tests expected to change¶
If the rewrite is faithful, most of the ~120 constraint scenarios and 12 override unit tests should pass unchanged. Expected changes:
- One scenario expectation update: "Override punches a gap through a
minimum_duration extension" β output becomes
Jan-Augpartial, no Dec sliver. Update the scenario's expected table and comments. - No other constraint scenarios expected to change. The mechanical handlers produce the same outputs because applicability collapses to the constraint's contract-version VP when there's no override.
Tests NOT expected to change¶
- All 49 engine unit tests (
internal/engine/tests/). - All 12 override painting unit tests (
internal/engine/tests/test_overrides.py). - All 111 non-override constraint feature scenarios.
- 10 of 11 override constraint scenarios.
Risk register¶
- Filtered overrides (
service_types,module_slugs). No current test exercises them. The new abstraction needs to encode the filter somewhere β either as a richer timeline value, or by carrying a per-period match check at the handler level. Decide the shape in step 1 of the implementation order. - Chain-bridging in
minimum_duration. Today's iteration-2 bridging mechanic (where two chains separated by an override window merge into one logical commitment) is a side effect of diff-revert's splitting. The unified-timeline replaces it with "applicability boundary = partial commitment OK; no skip-and-resume." Different semantic; document in the scenario. - Single-PR scope. The change touches 8 handlers, the loop, and one new abstraction. Each handler change is small but multiplied by 8 it's a meaningful diff. Run the full constraint suite after each handler is migrated to catch regressions early.