T15 — End-to-End Verification Report

Date: 2026-05-28 Status: All 6 gates pass against a clean rebuild. Cross-model reviewed across four Codex rounds on T15 itself (round 1 BLOCK → round 2 BLOCK → round 3 APPROVED_WITH_NOTES → round 4 APPROVED on the polish-pass commit). Subsequent rounds covered the MCP server and the presentation pass. The round-3 notes about silent fallback defaults were hardened in the polish-pass commit (now raise on missing observations). Test suite: tests/test_end_to_end.py (15 tests, all green) + 69 pre-existing tests + 19 srdb_mcp tests = 103 passed, 2 skipped at HEAD.

Reproduce

rm -f srdb.duckdb
uv run python scripts/build_all.py        # clean build, 28 steps, ~30-45s
uv run pytest tests/test_end_to_end.py -v
uv run python -m scenarios.cat4_fla_landfall
uv run python -m scenarios.hh_front_up
uv run python -m scenarios.btc_drawdown_vol_spike

Gates

#	Gate	Status	Evidence
G1	All 6 substrate demo queries return non-empty	✅	Row counts: Q1=3, Q2=6, Q3a=1, Q3b=1, Q4=4, Q5=3, Q6=158
G2	Reconcile detects exactly 4 breaks, categories AND targets (`position_id`/`entity`/`instrument`) match manifest	✅	First run: 4 breaks, 4 new. Targets match `breaks_manifest.json` exactly: PRICE on Haynesville 99002 (`d4c9ae29`), QUANTITY on CAT-XOL FL treaty, CLASSIFICATION on Tampa SFR, MISSING_IN_ABOR on crypto deriv `fed8751d`
G3	Break carry-forward — 2nd run shows n_breaks_new=0	✅	Second run: n_breaks_new=0, n_breaks_carried=4
G4	position_lookthrough materialized; LTRe decomposes through SRAM into all 4 asset classes	✅	36 lookthrough rows. LTRe → `{WELL, TREATY, SFR_PORTFOLIO, CRYPTO_DERIV}`. 15 rows traverse SRAM_MULTISTRAT
G5	Cat-4 produces coherent narrative + per-entity rollup including LTRe	✅	`narrative_grounding.event_summary` (>100 chars) + `per_entity_rollup` includes "LongTail Re Ltd". Accounting identity: error=$0.0000
G6	Every position row has non-null `mark_series_id` + `mark_observation_ts`	✅	0 / 350 positions with NULL mark lineage

Regressions T15 caught and fixed

These were real production-breaking bugs, all fixed:

Round 1 (caught by my initial gate-running)

schema/04_analytics.sql vs pipelines/lookthrough.py — divergent position_lookthrough schemas. Schema file had an 8-column form; pipeline DROP+CREATEd a 12-column form with lookthrough_id PK + direct_* columns. schema/views_lookthrough.sql binds against the 12-col form, so load_schema.py failed on fresh build. Fix: aligned 04_analytics.sql to the pipeline's runtime schema.
Stale hardcoded UUIDs in pipelines/derive_series.py and pipelines/lookthrough.py. EIA/CoinMetrics/Deribit ingestion uses uuid4() each rebuild, so module-level UUID constants drift on every full rebuild. Symptom: well revenue derivation produced 0 inserts; 4 crypto-deriv positions had NULL mark_observation_ts. Fix: runtime resolution by (measure_code, source, semantic_context).

Round 2 (caught by Codex cross-model review)

Bitemporal seed reorder retargeted the deliberate PRICE break. My round-1 fix moved the bitemporal seed before abor_feed to avoid a false-positive MISSING_IN_ABOR. Side effect: pick_price_break_position selects the highest-MV well; the seed well ($5.10M) outweighed real Haynesville wells, silently retargeting the PRICE break to position 00000000-...0401-...0001. Fix: revert reorder; instead, add instrument_id NOT LIKE '00000000-%' exclusion in abor_feed + reconcile_abor so the seed is properly demo-scoped and doesn't leak into production reconciliation.
Missing txn_to IS NULL filter in abor_feed.py and pipelines/reconcile_abor.py. Both queries used WHERE p.valid_to IS NULL without p.txn_to IS NULL, so bitemporally-superseded rows could be treated as current knowledge. Fix: added txn_to IS NULL to all five srdb position queries in abor_feed.py (run, pick_price, pick_quantity, pick_classification, pick_missing) + the srdb_latest CTE in reconcile_abor.py.
Same UUID-drift pattern across synth/seed_analytics.py, scenarios/hh_front_up.py, scenarios/btc_drawdown_vol_spike.py. Codex spotted that my round-1 fix only patched 2 of 5 affected modules. seed_analytics.py had 6 stale series UUIDs + 6 stale Zillow UUIDs; hh_front_up.py had 3 + a stale instrument_id → series_id map; btc_drawdown_vol_spike.py had 2. Concrete repro of breakage: scenarios.hh_front_up.run_scenario() raised ValueError: No observations found for series_id=6fc2b285-.... Fix: replicated the runtime-resolver pattern in all three modules.
G2 test was too loose to catch issue #3. Original test only checked {break_categories} as a set — would have green-lit any retargeting. Fix: added test_g2_break_targets_match_manifest that asserts the (category, holding_entity_id, instrument_id) tuple from each detected break matches the manifest exactly.

Note on stability

After this revision, no hardcoded series UUIDs remain in the codebase outside of test data and pipeline-internal lookups. All cross-module references to series go through _ensure_ids/_resolve_series runtime resolution against the live DB, using (measure_code, source, semantic_context) as the stable composite key. Future rebuilds with fresh EIA/CoinMetrics/Deribit/Zillow/FRED ingestion cannot reintroduce the regression.