Complete Epic #23 presentation evidence gates#29
Conversation
There was a problem hiding this comment.
Pull request overview
This PR completes Epic #23’s “presentation evidence gates” work by (1) introducing an EvidenceLedger-based numeric significance gate, (2) adding deterministic LaTeX sanity checks (unresolved refs, scaffold leakage, cross-run identity, boilerplate repetition) and wiring them into the submission bundle render gate, and (3) adding CPU-only boundary tests to ensure presentation code paths don’t import GPU/execution dependencies.
Changes:
- Add minimal
build_evidence_ledger, numeric significance gating, and EvidenceLedger traceability/schema checks inagents/paper_completeness.py. - Expand
latex_sanity_checkwith deterministic checks and passstatefrom the submission pipeline so cross-run identity gating can work. - Add milestone/regression tests and fixtures for M1/M2/M4/M5 gates, including submission-bundle blocking tests.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
agents/paper_completeness.py |
Adds EvidenceLedger builder, numeric significance gating, traceability/schema checks, and deterministic LaTeX sanity checks. |
agents/paper_orchestra_pipeline.py |
Passes state into latex_sanity_check to enable state-aware deterministic gating during bundle generation. |
tests/test_paper_completeness_m1.py |
Adds M1 tests for numeric significance gating and minimal EvidenceLedger builder contract. |
tests/test_paper_completeness_m4.py |
Adds M4 tests for EvidenceLedger schema validation and Abstract/Conclusion traceability checks. |
tests/test_latex_sanity_m2.py |
Adds M2 tests for deterministic LaTeX sanity rules (unresolved refs, placeholders, cross-run identity, repetition). |
tests/test_vnext_manuscript.py |
Adds submission-bundle integration tests ensuring render-gate blocks deterministic LaTeX violations and preserves existing gates. |
tests/test_presentation_cpu_boundary_m5.py |
Adds CPU-only boundary tests ensuring presentation modules don’t load GPU/execution dependencies and can render/materialize offline. |
tests/fixtures/* |
Adds fixtures for M1/M2/M4 deterministic checks and traceability scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _significance_alpha() -> float: | ||
| raw = os.environ.get("DEEPGRAPH_SIGNIFICANCE_ALPHA") | ||
| alpha = _numeric(raw) | ||
| if alpha is None or alpha <= 0: | ||
| return 0.05 | ||
| return alpha |
| p_value = _numeric(packet.get("p_value")) | ||
| effect_size = _numeric(_first_present(packet.get("effect_size"), packet.get("effect_pct"))) | ||
| metric = _text(_first_present(packet.get("metric_name"), summary.get("primary_metric"), summary.get("metric_name"))) |
| def _strip_latex_code_blocks(text: str) -> str: | ||
| stripped = re.sub( | ||
| r"\\begin\{(?:verbatim|lstlisting|minted)\}.*?\\end\{(?:verbatim|lstlisting|minted)\}", | ||
| "", | ||
| text or "", | ||
| flags=re.DOTALL | re.IGNORECASE, | ||
| ) | ||
| stripped = re.sub(r"```.*?```", "", stripped, flags=re.DOTALL) | ||
| return stripped |
| hits.append( | ||
| _line_hit( | ||
| "cross_run_identity", | ||
| token, | ||
| base_line + snippet.count("\n", 0, token_match.start()), | ||
| snippet.splitlines()[0] if snippet.splitlines() else snippet, | ||
| ) | ||
| ) |
|
|
1 similar comment
|
|
Summary
Completes Epic #23 across the requested issue order on convergence branch
epic-23-evidence-ledger:Includes one preliminary branch-only regression fixture stabilization commit so the required baseline tests are green on this convergence branch.
masterwas not modified.Changed Files
agents/paper_completeness.pyagents/paper_orchestra_pipeline.pytests/test_paper_completeness_m1.pytests/test_paper_completeness_m4.pytests/test_latex_sanity_m2.pytests/test_vnext_manuscript.pytests/test_presentation_cpu_boundary_m5.pytests/fixtures/*Tests Run
Preflight stabilization:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passed#24:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_paper_completeness_m1.py->7 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passed#27:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_paper_completeness_m4.py->6 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_paper_completeness_m1.py tests/test_paper_completeness_m4.py->13 passed#25:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_latex_sanity_m2.py->8 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passed#28:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_vnext_manuscript.py->10 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_latex_sanity_m2.py tests/test_vnext_manuscript.py->18 passed#26:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->10 passedpytest tests/test_presentation_cpu_boundary_m5.py->3 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->10 passedpytest tests/test_paper_completeness_m1.py tests/test_paper_completeness_m4.py tests/test_latex_sanity_m2.py tests/test_presentation_cpu_boundary_m5.py->24 passedNon-goals / Skips
master.\input{content/conclusion}traceability.contracts/pipeline.pyorrequire_submission_ready().