T15 — End-to-End Verification Report
Date: 2026-05-28
Status: All 6 gates pass against a clean rebuild. Cross-model reviewed across four Codex rounds on T15 itself (round 1 BLOCK → round 2 BLOCK → round 3 APPROVED_WITH_NOTES → round 4 APPROVED on the polish-pass commit). Subsequent rounds covered the MCP server and the presentation pass. The round-3 notes about silent fallback defaults were hardened in the polish-pass commit (now raise on missing observations).
Test suite: tests/test_end_to_end.py (15 tests, all green) + 69 pre-existing tests + 19 srdb_mcp tests = 103 passed, 2 skipped at HEAD.
Reproduce
rm -f srdb.duckdb
uv run python scripts/build_all.py # clean build, 28 steps, ~30-45s
uv run pytest tests/test_end_to_end.py -v
uv run python -m scenarios.cat4_fla_landfall
uv run python -m scenarios.hh_front_up
uv run python -m scenarios.btc_drawdown_vol_spike
Gates
| # | Gate | Status | Evidence |
|---|---|---|---|
| G1 | All 6 substrate demo queries return non-empty | ✅ | Row counts: Q1=3, Q2=6, Q3a=1, Q3b=1, Q4=4, Q5=3, Q6=158 |
| G2 | Reconcile detects exactly 4 breaks, categories AND targets (position_id/entity/instrument) match manifest | ✅ | First run: 4 breaks, 4 new. Targets match breaks_manifest.json exactly: PRICE on Haynesville 99002 (d4c9ae29), QUANTITY on CAT-XOL FL treaty, CLASSIFICATION on Tampa SFR, MISSING_IN_ABOR on crypto deriv fed8751d |
| G3 | Break carry-forward — 2nd run shows n_breaks_new=0 | ✅ | Second run: n_breaks_new=0, n_breaks_carried=4 |
| G4 | position_lookthrough materialized; LTRe decomposes through SRAM into all 4 asset classes | ✅ | 36 lookthrough rows. LTRe → {WELL, TREATY, SFR_PORTFOLIO, CRYPTO_DERIV}. 15 rows traverse SRAM_MULTISTRAT |
| G5 | Cat-4 produces coherent narrative + per-entity rollup including LTRe | ✅ | narrative_grounding.event_summary (>100 chars) + per_entity_rollup includes "LongTail Re Ltd". Accounting identity: error=$0.0000 |
| G6 | Every position row has non-null mark_series_id + mark_observation_ts | ✅ | 0 / 350 positions with NULL mark lineage |
Regressions T15 caught and fixed
These were real production-breaking bugs, all fixed:
Round 1 (caught by my initial gate-running)
-
schema/04_analytics.sqlvspipelines/lookthrough.py— divergentposition_lookthroughschemas. Schema file had an 8-column form; pipelineDROP+CREATEd a 12-column form withlookthrough_idPK +direct_*columns.schema/views_lookthrough.sqlbinds against the 12-col form, soload_schema.pyfailed on fresh build. Fix: aligned04_analytics.sqlto the pipeline's runtime schema. -
Stale hardcoded UUIDs in
pipelines/derive_series.pyandpipelines/lookthrough.py. EIA/CoinMetrics/Deribit ingestion usesuuid4()each rebuild, so module-level UUID constants drift on every full rebuild. Symptom: well revenue derivation produced 0 inserts; 4 crypto-deriv positions had NULLmark_observation_ts. Fix: runtime resolution by(measure_code, source, semantic_context).
Round 2 (caught by Codex cross-model review)
-
Bitemporal seed reorder retargeted the deliberate PRICE break. My round-1 fix moved the bitemporal seed before
abor_feedto avoid a false-positiveMISSING_IN_ABOR. Side effect:pick_price_break_positionselects the highest-MV well; the seed well ($5.10M) outweighed real Haynesville wells, silently retargeting the PRICE break to position00000000-...0401-...0001. Fix: revert reorder; instead, addinstrument_id NOT LIKE '00000000-%'exclusion inabor_feed+reconcile_aborso the seed is properly demo-scoped and doesn't leak into production reconciliation. -
Missing
txn_to IS NULLfilter inabor_feed.pyandpipelines/reconcile_abor.py. Both queries usedWHERE p.valid_to IS NULLwithoutp.txn_to IS NULL, so bitemporally-superseded rows could be treated as current knowledge. Fix: addedtxn_to IS NULLto all five srdb position queries inabor_feed.py(run, pick_price, pick_quantity, pick_classification, pick_missing) + thesrdb_latestCTE inreconcile_abor.py. -
Same UUID-drift pattern across
synth/seed_analytics.py,scenarios/hh_front_up.py,scenarios/btc_drawdown_vol_spike.py. Codex spotted that my round-1 fix only patched 2 of 5 affected modules.seed_analytics.pyhad 6 stale series UUIDs + 6 stale Zillow UUIDs;hh_front_up.pyhad 3 + a staleinstrument_id → series_idmap;btc_drawdown_vol_spike.pyhad 2. Concrete repro of breakage:scenarios.hh_front_up.run_scenario()raisedValueError: No observations found for series_id=6fc2b285-.... Fix: replicated the runtime-resolver pattern in all three modules. -
G2 test was too loose to catch issue #3. Original test only checked
{break_categories}as a set — would have green-lit any retargeting. Fix: addedtest_g2_break_targets_match_manifestthat asserts the(category, holding_entity_id, instrument_id)tuple from each detected break matches the manifest exactly.
Note on stability
After this revision, no hardcoded series UUIDs remain in the codebase outside of test data and pipeline-internal lookups. All cross-module references to series go through _ensure_ids/_resolve_series runtime resolution against the live DB, using (measure_code, source, semantic_context) as the stable composite key. Future rebuilds with fresh EIA/CoinMetrics/Deribit/Zillow/FRED ingestion cannot reintroduce the regression.