Add tests for #1475: multishot candidate selection artifacts by prompt-driven-github[bot] · Pull Request #1478 · promptdriven/pdd

prompt-driven-github · 2026-06-07T23:54:03Z

Summary

Adds tests based on the requirements in #1475.

Test Files

tests/test_multishot_candidate.py — 37 pytest test functions across 9 scenarios
pdd/multishot_candidate.py — implementation module supporting the tests
pdd/schemas/selection_policy.schema.json — JSON schema for selection_policy.json artifact
pdd/schemas/multishot_candidate.schema.json — JSON schema for candidate records
tests/fixtures/multishot/selection_policy.json — golden fixture
tests/fixtures/multishot/candidate_records.jsonl — golden fixture (3 synthetic candidate records)
tests/fixtures/multishot/pass_at_k.json — golden fixture
tests/fixtures/multishot/selection_regret.json — golden fixture

Test Coverage

Total Tests: 37
Framework: pytest
Status: All 37 passing (0.55s)
Test Plan Coverage: 25/25 planned cases implemented (100%)

What These Tests Verify

Selection Policy Artifact — schema validation, leakage flags, frozen_before_generation enforcement, golden fixture round-trip
Leakage Rejection — parametrized over all 6 forbidden hidden input classes: hidden_pass_fail, hidden_stdout, hidden_stderr, hidden_test_names, prior_hidden_outcome, manual_post_hoc_choice
Deterministic Tie-Break — 10 repeated calls produce same winner; winner matches declared policy order; k=5 n-way tie stable under 10 input permutations
Candidate Record Completeness — k=3 required fields all present; JSONL schema validation; hidden_status absent before freeze / present after freeze
Malformed/Missing Row Detection — missing task_id, missing candidate_index, wrong type, duplicate index all raise ValidationError
Metric Recomputation — oracle count, selected count, regret count/rate, oracle-lift capture (incl. oracle_count=0 edge case), unbiased pass-at-k, edge cases k=1/all-pass/all-fail
Golden JSON — all four artifacts (selection_policy.json, candidate_records.jsonl, pass_at_k.json, selection_regret.json) validated and exactly recomputed from JSONL
Freeze / Hidden-Label Lifecycle — FrozenStateError raised before freeze; labels accessible after freeze
k=1 Degenerate Case — single candidate always selected; pass-at-1 equals pass rate

Contract Test Summary

N/A — no OpenAPI spec found. Schema validation is handled via jsonschema.validate against pdd/schemas/ files.

Accessibility Audit Summary

N/A — not a web test. This is a Python CLI/research harness feature.

Manual Testing Summary

N/A — all scenarios covered by automated pytest tests with synthetic tmp_path fixtures and no external dependencies.

Test Execution

pytest -vv tests/test_multishot_candidate.py

Next Steps

Review the generated tests
Run tests locally to verify
Adjust tests if needed
Mark PR as ready for review

Closes #1475

Generated by PDD agentic test workflow (18-step)

Adds 37 pytest tests across 9 scenarios covering: selection policy artifact schema validation, leakage rejection (6 parametrized forbidden input classes), deterministic tie-break stability, candidate record completeness, malformed/missing row detection, metric recomputation (oracle count, selected count, regret rate, oracle-lift capture, unbiased pass-at-k), golden JSON fixtures for all four artifacts, freeze/hidden-label lifecycle, and k=1 degenerate case. New files: - pdd/multishot_candidate.py: implementation (SelectionPolicy, CandidateRecord, LeakageError, FrozenStateError, select_winner, compute_metrics, pass_at_k) - pdd/schemas/selection_policy.schema.json: JSON schema for selection_policy.json - pdd/schemas/multishot_candidate.schema.json: JSON schema for candidate records - tests/test_multishot_candidate.py: 37 tests, all passing - tests/fixtures/multishot/: golden fixtures (selection_policy.json, candidate_records.jsonl, pass_at_k.json, selection_regret.json) Closes #1475 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

prompt-driven-github Bot mentioned this pull request Jun 7, 2026

feat: add public-only multishot candidate selection artifacts #1475

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for #1475: multishot candidate selection artifacts#1478

Add tests for #1475: multishot candidate selection artifacts#1478
prompt-driven-github[bot] wants to merge 1 commit into
mainfrom
test/issue-1475

prompt-driven-github Bot commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

prompt-driven-github Bot commented Jun 7, 2026

Summary

Test Files

Test Coverage

What These Tests Verify

Contract Test Summary

Accessibility Audit Summary

Manual Testing Summary

Test Execution

Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant