feat: add interactive repair session protocol (#1435) by prompt-driven-github[bot] · Pull Request #1485 · promptdriven/pdd

prompt-driven-github · 2026-06-08T17:41:40Z

Summary

Adds Block 1 (#1435) for interactive prompt repair: the engine-agnostic InteractiveRepairSession protocol, typed RepairOption and ApprovedPatch dataclasses, and a deterministic FakeInteractiveSession for contract tests.

Aligned with the Hybrid decision in #1434 / PR #1497: this module is the shared Python contract; TTY menus and --apply gating land in #1436, patch application in #1437.

Closes #1435

Changes

Area	Files
Protocol + fake backend	`pdd/checkup_interactive_session.py`
Prompt	`pdd/prompts/checkup_interactive_session_python.prompt`
Contract tests (23)	`tests/test_checkup_interactive_session.py` — includes merged coverage from #1496
Docs	`docs/checkup_interactive_session.md` (Hybrid #1434 cross-link), `docs/evidence_manifest.md`
Example	`context/checkup_interactive_session_example.py`
Registry	`architecture.json` — single module entry only (no unicode normalization churn)

Review cleanup (this revision)

Rebased onto main; dropped unrelated README / architecture.json bulk diff
ApprovedPatch.finding_id — optional field stamped from record_choice when absent
record_choice rejects duplicate choices per finding (after presented-option validation)
Merged additional contract tests from test/issue-1485 (Add tests for #1485 #1496)
Documented Hybrid ownership boundary and feat(checkup): Pi agent spike for interactive repair session backend #1434 spike cross-link

Test plan

pytest -vv tests/test_checkup_interactive_session.py — 23 tests pass
docs/checkup_interactive_session.md references Hybrid feat(checkup): Pi agent spike for interactive repair session backend #1434 context
approved_patches() returns only approving kinds; skips skip, custom_no_patch, no_patch
record_choice rejects unpresented options and duplicate finding IDs

Merge order

Merge #1434 (PR #1497) first, then this PR.

prompt-driven-github · 2026-06-08T23:00:23Z

🚀 Job Queued!

Job ID: ij43YnpqIYWfGsVl7hb8
Triggered by: @DianaTao
Label: test

View Live Progress

Budget cap: none

You can add or change a cap by commenting:

/pdd budget N
/pdd settings
/pdd stop

prompt-driven-github · 2026-06-08T23:03:39Z

Step 1: Duplicate Check

Status: No duplicates found

Search Performed

Searched for: InteractiveRepairSession, checkup_interactive_session, FakeInteractiveSession RepairOption ApprovedPatch, interactive repair session test, test checkup_interactive_session, test 1435
Issues reviewed: ~20 issues across open and closed states
Filtered specifically for pdd-test label matches

Findings

No existing test-generation issues were found targeting the checkup_interactive_session module, the InteractiveRepairSession protocol, or the FakeInteractiveSession fake backend.

Related feature issues exist (#1435, #1423, #1434, #1436, #1437), but these are implementation/feature issues — none are test generation requests for this module. The only pdd-test labeled issues found cover unrelated functionality (PR #1489 user story generation in #1492, and pdd checkup negative acceptance criteria in #1493).

This is a new test request for the checkup_interactive_session.py module introduced in the change tracked by #1435.

Proceeding to Step 2: Documentation Check

prompt-driven-github · 2026-06-08T23:03:48Z

Step 1/18: Search for duplicate test requests

Step 1 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:05:50Z

Step 2: Documentation Check

Status: Research Complete

Project Structure

Frontend type: Python CLI / library (no frontend — pure Python package)
Test directory: tests/
Existing test framework: pytest (with pytest-mock via conftest mocker fixture)

Components to Test

ApprovedPatch dataclass — pdd/checkup_interactive_session.py
RepairOption dataclass — pdd/checkup_interactive_session.py
InteractiveRepairSession Protocol — pdd/checkup_interactive_session.py
FakeInteractiveSession concrete class — pdd/checkup_interactive_session.py
NON_APPROVING_PATCH_KINDS constant — pdd/checkup_interactive_session.py

Test Infrastructure Found

Config file: tests/conftest.py — HOME isolation via tempdir, mocker fixture, dotenv loading
Existing fixtures: mocker (pytest-mock compatible), tmp_path (standard pytest)
Test patterns:
- Pure unit tests, no fixtures required for this module
- Helper factory functions (_patch(), _option()) to create test objects
- `pytest.raises(ValueError, match="...")" for exception tests
- Direct equality assertions: assert x == y
- Type membership: assert all(isinstance(...) for ...)
- Deepcopy invariant tests (mutate one copy, re-fetch, verify independence)

API Configuration (if API testing)

Base URL source: N/A — no HTTP endpoints involved
Authentication: N/A
API docs: docs/checkup_interactive_session.md — session API contract, R1–R4 rules

External Service Contracts (for Mock Generation)

Service	Function/Method to Mock	Input Types	Return Shape (from source code)
N/A — module uses standard library only	—	—	—

Source: pdd/checkup_interactive_session.py lines 1–124 — no third-party imports

Key API Details (for test generation)

ApprovedPatch dataclass (pdd/checkup_interactive_session.py):

@dataclass
class ApprovedPatch:
    kind: str
    target: Path          # __post_init__ coerces to Path
    anchor: dict[str, Any]  # __post_init__ copies to dict
    replacement: str

RepairOption dataclass:

@dataclass
class RepairOption:
    label: str
    preview: str
    patch: ApprovedPatch  # non-optional (R1 contract rule)

FakeInteractiveSession constructor:

FakeInteractiveSession(
    options_by_finding: Mapping[str, Sequence[RepairOption]] | None = None,
    answers: Iterable[str] | None = None,
)

Observable state: report, options_by_finding, presented_options, recorded_choices, qa_transcript_summary, _answers (deque)

Contract rules from docs:

R1: Every RepairOption.patch MUST be a non-optional ApprovedPatch
R2: record_choice() MUST raise ValueError if option not previously presented
R3: approved_patches() MUST return only patches whose kind is NOT in NON_APPROVING_PATCH_KINDS = frozenset({"skip", "custom_no_patch", "no_patch"})
R4: Session layer MUST NOT write files, apply patches, or persist artifacts

Existing test coverage (tests/test_checkup_interactive_session.py, 6 tests, 104 lines):

test_fake_session_presents_seeded_options_and_records_choice — basic present/record flow
test_fake_session_rejects_unpresented_choices — ValueError on unpresented and mismatched options
test_approved_patches_returns_only_typed_approving_patches — non-approving kinds excluded
test_approved_patches_returns_a_fresh_copy — deepcopy semantics
test_scripted_answers_are_consumed_and_summarized — ask() deque drain + QA transcript
test_seed_can_import_mapping_style_findings — seed() with findings list containing dicts

Notes

The module is stdlib-only (no mocking of external services needed)
FakeInteractiveSession IS the fake backend — tests use it directly
InteractiveRepairSession is a typing.Protocol; structural conformance can be asserted with isinstance(session, InteractiveRepairSession) if runtime-checkable, or confirmed by testing all 5 method signatures
seed() supports two import paths: (a) explicit options_by_finding at construction time, (b) mapping-style findings list passed at seed() time
The context/checkup_interactive_session_example.py file demonstrates the golden-path workflow end-to-end
Additional coverage areas not yet tested: ApprovedPatch.__post_init__ Path coercion, anchor dict copy independence, empty options_by_finding / empty answers edge cases, multiple findings in one session, approved_patches() with zero recorded choices

Proceeding to Step 3: Analyze & Clarify

prompt-driven-github · 2026-06-08T23:06:01Z

Step 2/18: Review codebase to understand what to test

Step 2 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:06:39Z

Step 3: Analyze & Clarify

Status: Proceed

Summary

We have sufficient information to create comprehensive tests for pdd/checkup_interactive_session.py. The module is stdlib-only with 4 well-documented public types (ApprovedPatch, RepairOption, InteractiveRepairSession, FakeInteractiveSession). The API contract, side-effect boundaries, and fake backend behavior are clearly defined in the docs. The 6 existing tests provide a solid baseline with identified gaps that can now be filled.

No external services, credentials, or environment setup are needed — all tests are pure unit tests.

Test Scenarios Identified

ApprovedPatch.__post_init__ coerces target from a plain string to a Path
ApprovedPatch.__post_init__ copies the anchor mapping so mutations to the input don't affect the dataclass
approved_patches() returns an empty list when no choices have been recorded
Multiple findings can be presented and choices recorded in a single session, producing multiple approved patches
seed() with a non-mapping report stores the report without raising
seed() with findings that use the finding_id key (vs id) correctly imports options
present_finding() returns an independent list each call (not the same list object)
ask() with an exhausted answers queue returns an empty string
record_choice() after re-presenting a finding (overwriting presented_options) allows a new valid choice
FakeInteractiveSession satisfies the InteractiveRepairSession structural protocol
approved_patches() excludes all three non-approving kinds (skip, custom_no_patch, no_patch)
approved_patches() returns deep copies that are independent from internal state

Proceeding to Step 4: Detect Frontend

prompt-driven-github · 2026-06-08T23:06:50Z

Step 3/18: Determine if enough info

Step 3 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:07:23Z

Step 4: Detect Test Type

Status: Test Type Detected

Test Type Analysis

Type: API
Framework/Stack: Python 3.12+, stdlib-only module, pytest
Test Framework: pytest

Environment Requirements

Base URL/Command: N/A
Environment Variables: None required
Authentication: Not required

Test File Convention

Directory: tests/
Naming: test_*.py

Rationale

The checkup_interactive_session module is a stdlib-only Python library (pdd/checkup_interactive_session.py) providing ApprovedPatch, RepairOption, InteractiveRepairSession (Protocol), and FakeInteractiveSession. It has no UI, no CLI entry points, and no HTTP endpoints — all functionality is exercised through direct Python calls. The codebase already uses pytest for all unit tests, and the existing tests/test_checkup_interactive_session.py follows the test_*.py naming convention with no fixtures or external dependencies. pytest is the clear choice: zero additional dependencies, already installed, and directly exercises the module's public API.

Proceeding to Step 5: Create Test Plan

prompt-driven-github · 2026-06-08T23:07:33Z

Step 4/18: Identify test type

Step 4 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:08:31Z

Step 5: Create Test Plan

Status: Plan Ready

Test Plan Overview

Total Test Cases: 14 (12 new + 2 that extend existing coverage)
Framework: pytest
Estimated Coverage: All 4 public types (ApprovedPatch, RepairOption, InteractiveRepairSession, FakeInteractiveSession), all public methods, all contract rules R1–R4 from the documentation, and all three non-approving patch kinds

Test Scenarios

Scenario 1: `ApprovedPatch` Dataclass Initialization and Coercions

Description: __post_init__ ensures target is always a Path and anchor is an independent dict copy, regardless of the input types passed to the constructor.
Test Cases:
1. Constructing ApprovedPatch with a plain str target coerces it to Path
2. Mutating the original anchor dict after construction does not affect the stored anchor (copy independence)

Scenario 2: `FakeInteractiveSession.seed()` Behavior

Description: seed() must accept any value without raising, and when the report is a mapping with findings, it should import typed RepairOption objects keyed by both id and finding_id fields.
Test Cases:
1. seed() with a non-mapping report (e.g., a string or list) stores the report without raising
2. seed() with findings using the finding_id key (vs id) correctly imports options into options_by_finding
3. seed() does not overwrite pre-existing options loaded at construction time (uses setdefault)

Scenario 3: `FakeInteractiveSession.present_finding()` Behavior

Description: present_finding() must return a fresh independent list each call (not aliased to internal state), and return an empty list for an unknown finding ID.
Test Cases:
1. Returns a new list object each call — modifying the returned list does not affect subsequent calls
2. Returns an empty list for an unknown finding_id

Scenario 4: `FakeInteractiveSession.ask()` Queue Exhaustion

Description: Consuming all scripted answers and calling ask() again must return an empty string rather than raising.
Test Cases:
1. ask() when the answers deque is empty returns "" (Note: partially covered by existing test; verify explicit empty-session case)

Scenario 5: Multi-Finding Session and `approved_patches()` Edge Cases

Description: A session may span multiple findings; approved_patches() must return one patch per approving choice across all findings, exclude all three non-approving kinds, return an empty list before any choices are recorded, and return deep-copied patches that are independent of internal state.
Test Cases:
1. approved_patches() returns an empty list when no choices have been recorded
2. Presenting and recording choices for two different findings produces two approved patches in order
3. record_choice() after re-presenting a finding (overwriting presented_options) accepts the re-presented option as valid
4. approved_patches() excludes patches with kind no_patch (in addition to skip and custom_no_patch already covered)
5. The returned patches are independent deep copies — mutating a returned patch does not affect the internal recorded_choices state

Scenario 6: `InteractiveRepairSession` Protocol Conformance

Description: FakeInteractiveSession must structurally satisfy InteractiveRepairSession so any orchestration code typed against the protocol accepts it without casting.
Test Cases:
1. isinstance(FakeInteractiveSession(), InteractiveRepairSession) (or runtime_checkable equivalent) confirms structural conformance

Test Configuration

Setup required: None — stdlib-only module, no fixtures or environment variables needed
Fixtures needed: _patch() and _option() factory helpers already exist in the test file
Cleanup: None — all state is in-memory

Mocking Strategy

Dependency	Function to Mock	Mock Return Value	Error Case
(none)	N/A — stdlib only	N/A	N/A

No external dependencies to mock. The module imports only from the standard library (collections, copy, dataclasses, pathlib, typing).

Contract source files reviewed:

pdd/checkup_interactive_session.py — all types and methods
docs/checkup_interactive_session.md — R1–R4 contract rules and no-write boundary
tests/test_checkup_interactive_session.py — 6 existing tests (baseline coverage)

Proceeding to Step 5b: Enhance Plan

prompt-driven-github · 2026-06-08T23:08:45Z

Step 5/18: Create test plan

Step 5 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:09:16Z

Step 5b: Enhanced Test Plan

Status: Plan Enhanced

Contract Validation Tests (if applicable)

No OpenAPI/Swagger spec was identified in Step 2. The checkup_interactive_session module is stdlib-only with no HTTP endpoints or REST API surface — contract validation via JSON schema is not applicable.

Contract test cases added: 0

Accessibility Tests (if applicable)

Step 4 detected TEST_TYPE: api (pure Python unit tests), not web. There are no UI pages or browser routes to audit — WCAG accessibility testing via @axe-core/playwright is not applicable.

Accessibility test cases added: 0

Enhanced Plan Summary

Original test cases: 14 (from Step 5)
Contract test cases: 0 (no OpenAPI spec — stdlib-only module)
Accessibility test cases: 0 (TEST_TYPE is api, not web)
Total enhanced plan cases: 14

The original 14-case plan from Step 5 stands as the complete test plan:

Scenario 1: `ApprovedPatch` Dataclass Initialization and Coercions

Constructing ApprovedPatch with a plain str target coerces it to Path
Mutating the original anchor dict after construction does not affect the stored anchor (copy independence)

Scenario 2: `FakeInteractiveSession.seed()` Behavior

seed() with a non-mapping report stores the report without raising
seed() with findings using finding_id key correctly imports options
seed() does not overwrite pre-existing options (uses setdefault)

Scenario 3: `FakeInteractiveSession.present_finding()` Behavior

Returns a new list object each call — modifying returned list does not affect subsequent calls
Returns an empty list for an unknown finding_id

Scenario 4: `FakeInteractiveSession.ask()` Queue Exhaustion

ask() when answers deque is empty returns ""

Scenario 5: Multi-Finding Session and `approved_patches()` Edge Cases

approved_patches() returns empty list when no choices recorded
Two findings produce two approved patches in order
record_choice() after re-presenting a finding accepts the re-presented option
approved_patches() excludes no_patch kind
Returned patches are independent deep copies

Scenario 6: Protocol Conformance

FakeInteractiveSession satisfies InteractiveRepairSession structural protocol

Proceeding to Step 6: Assess Automated Test Coverage

prompt-driven-github · 2026-06-08T23:09:32Z

Step 5.5/18: Enhance plan

Step 5.5 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:12:15Z

Step 12: Generated Tests

Test Files Modified

tests/test_checkup_interactive_session.py

Test Summary

Total Tests: 20 (6 existing + 14 new)
Test Suites: 1
Framework: pytest

New Test Cases Added

Scenario 1: `ApprovedPatch` Dataclass Coercions

test_approved_patch_coerces_string_target_to_path — __post_init__ converts a plain str target to Path
test_approved_patch_anchor_is_copy_independent_of_input — mutating the input dict after construction does not affect the stored anchor

Scenario 2: `FakeInteractiveSession.seed()` Behaviour

test_seed_non_mapping_report_stores_report_without_raising — non-mapping reports (str, None, list) are stored without raising
test_seed_imports_findings_using_finding_id_key — findings keyed with finding_id (not id) are imported correctly
test_seed_does_not_overwrite_preexisting_options — setdefault preserves options set before seed()

Scenario 3: `FakeInteractiveSession.present_finding()` Behaviour

test_present_finding_returns_independent_list_each_call — mutating one returned list does not affect the next call
test_present_finding_returns_empty_list_for_unknown_finding — unknown finding_id yields []

Scenario 4: `FakeInteractiveSession.ask()` Queue Exhaustion

test_ask_returns_empty_string_when_answers_queue_is_empty — empty deque returns "" and records it in the transcript

Scenario 5: Multi-Finding Session and `approved_patches()` Edge Cases

test_approved_patches_returns_empty_list_when_no_choices_recorded — zero choices → empty list
test_multi_finding_session_produces_ordered_approved_patches — two findings → two patches in order
test_record_choice_after_representing_finding_accepts_newly_presented_option — re-presenting overwrites the valid set; old options are rejected
test_approved_patches_excludes_no_patch_kind — the no_patch kind (the third non-approving kind) is excluded
test_approved_patches_deep_copy_is_independent_of_recorded_choice — mutating a returned copy does not alias the recorded option data

Scenario 6: Protocol Conformance

test_fake_session_satisfies_interactive_repair_session_protocol — FakeInteractiveSession is assignable to InteractiveRepairSession and all protocol methods are callable

Test Code Overview

# Example: ApprovedPatch coercions
def test_approved_patch_coerces_string_target_to_path() -> None:
    patch = ApprovedPatch(kind='vocab_definition', target='prompts/foo.prompt',
                          anchor={}, replacement='x')
    assert isinstance(patch.target, Path)

# Example: Protocol conformance
def test_fake_session_satisfies_interactive_repair_session_protocol() -> None:
    session: InteractiveRepairSession = FakeInteractiveSession()
    assert callable(session.seed)
    assert callable(session.present_finding)
    assert callable(session.ask)
    assert callable(session.record_choice)
    assert callable(session.approved_patches)

Running the Tests

pytest -vv tests/test_checkup_interactive_session.py

All 20 tests pass (6 original + 14 new).

Proceeding to Step 13: Run Tests

prompt-driven-github · 2026-06-08T23:12:25Z

Step 12/18: Generate tests

Step 12 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:13:19Z

Step 13: Run Tests

Status: All Tests Passing

Test Results

Passed: 20
Failed: 0
Skipped: 0

Execution Details

Duration: 0.34s
Framework: pytest 9.0.3, Python 3.12.3
CWD: worktree root (/tmp/pdd_job_ij43YnpqIYWfGsVl7hb8_wyt4aqef/.pdd/worktrees/test-issue-1485)
Multi-cwd check: .pddrc is in the worktree root — same as CI cwd, second run skipped

Summary

All 20 generated tests pass successfully (6 existing + 14 new).

Test	Status
`test_fake_session_presents_seeded_options_and_records_choice`	PASSED
`test_fake_session_rejects_unpresented_choices`	PASSED
`test_approved_patches_returns_only_typed_approving_patches`	PASSED
`test_approved_patches_returns_a_fresh_copy`	PASSED
`test_scripted_answers_are_consumed_and_summarized`	PASSED
`test_seed_can_import_mapping_style_findings`	PASSED
`test_approved_patch_coerces_string_target_to_path`	PASSED
`test_approved_patch_anchor_is_copy_independent_of_input`	PASSED
`test_seed_non_mapping_report_stores_report_without_raising`	PASSED
`test_seed_imports_findings_using_finding_id_key`	PASSED
`test_seed_does_not_overwrite_preexisting_options`	PASSED
`test_present_finding_returns_independent_list_each_call`	PASSED
`test_present_finding_returns_empty_list_for_unknown_finding`	PASSED
`test_ask_returns_empty_string_when_answers_queue_is_empty`	PASSED
`test_approved_patches_returns_empty_list_when_no_choices_recorded`	PASSED
`test_multi_finding_session_produces_ordered_approved_patches`	PASSED
`test_record_choice_after_representing_finding_accepts_newly_presented_option`	PASSED
`test_approved_patches_excludes_no_patch_kind`	PASSED
`test_approved_patches_deep_copy_is_independent_of_recorded_choice`	PASSED
`test_fake_session_satisfies_interactive_repair_session_protocol`	PASSED

Proceeding to Step 15: Validate Tests Against Plan

prompt-driven-github · 2026-06-08T23:13:28Z

Step 13/18: Execute generated tests

Step 13 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:13:54Z

Step 14: Fix & Iterate

Status: All Tests Passing

Fixes Applied

No fixes were needed. All 20 tests were already passing from Step 13 and continue to pass on re-verification.

Final Test Results

Passed: 20
Failed: 0
Fix Iterations: 0

Proceeding to Step 15: Validate Tests Against Plan

prompt-driven-github · 2026-06-08T23:14:01Z

Step 14/18: Fix failing tests

Step 14 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:14:50Z

Step 15: Test Plan Validation

Status: Validation Complete

Coverage Analysis

Planned cases: 14
Implemented cases: 14
Coverage ratio: 100%

Implemented Test Cases

#	Planned Case	Category	Status
1	`test_approved_patch_coerces_string_target_to_path`	behavioral — ApprovedPatch coercions	Implemented
2	`test_approved_patch_anchor_is_copy_independent_of_input`	behavioral — ApprovedPatch coercions	Implemented
3	`test_seed_non_mapping_report_stores_report_without_raising`	behavioral — seed() behaviour	Implemented
4	`test_seed_imports_findings_using_finding_id_key`	behavioral — seed() behaviour	Implemented
5	`test_seed_does_not_overwrite_preexisting_options`	behavioral — seed() behaviour	Implemented
6	`test_present_finding_returns_independent_list_each_call`	behavioral — present_finding() behaviour	Implemented
7	`test_present_finding_returns_empty_list_for_unknown_finding`	behavioral — present_finding() behaviour	Implemented
8	`test_ask_returns_empty_string_when_answers_queue_is_empty`	behavioral — ask() queue exhaustion	Implemented
9	`test_approved_patches_returns_empty_list_when_no_choices_recorded`	behavioral — approved_patches() edge cases	Implemented
10	`test_multi_finding_session_produces_ordered_approved_patches`	behavioral — approved_patches() edge cases	Implemented
11	`test_record_choice_after_representing_finding_accepts_newly_presented_option`	behavioral — approved_patches() edge cases	Implemented
12	`test_approved_patches_excludes_no_patch_kind`	behavioral — approved_patches() edge cases	Implemented
13	`test_approved_patches_deep_copy_is_independent_of_recorded_choice`	behavioral — approved_patches() edge cases	Implemented
14	`test_fake_session_satisfies_interactive_repair_session_protocol`	behavioral — Protocol conformance	Implemented

Missing Cases

None — all 14 planned cases are implemented.

Notes

Contract test cases: 0 (no OpenAPI spec — stdlib-only module, not applicable)
Accessibility test cases: 0 (TEST_TYPE is api, not web — not applicable)
Total tests in file: 20 (6 pre-existing + 14 new from plan)
No new files needed — coverage is 100%

Proceeding to Step 16: Run Newly Generated Tests

prompt-driven-github · 2026-06-08T23:15:01Z

Step 15/18: Validate tests against plan

Step 15 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:15:55Z

Step 17: Draft PR Created

Pull Request

PR #1496: Add tests for #1485

Branch

test/issue-1485

What's Included

Test files at:
- tests/test_checkup_interactive_session.py
Commits: 1

Test Summary

Total Tests: 20 (6 existing + 14 new)
Passing: 20
Framework: pytest
Test Plan Coverage: 14/14 planned cases implemented (100%)

Next Steps for Reviewers

Review the generated tests for completeness
Run tests locally: pytest -vv tests/test_checkup_interactive_session.py
Adjust test selectors/assertions if needed
Mark the PR as ready for review

Running the Tests

pytest -vv tests/test_checkup_interactive_session.py

Test generation complete. A draft PR with the tests has been created and linked to this issue.

prompt-driven-github · 2026-06-08T23:16:04Z

Step 17/18: Create draft PR

Step 17 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

prompt-driven-github · 2026-06-08T23:16:09Z

PDD Execution Successful

Command: test
Duration: 14.0 min
Cost: $0.1399 infra (LLM billed to your API keys)
Pull Request: #1496

prompt-driven-github · 2026-06-09T00:24:55Z

🚀 Job Queued!

Job ID: RE1yWVvM9HwKxOzf2PXA
Triggered by: @DianaTao
Label: checkup

View Live Progress

Budget cap: none

You can add or change a cap by commenting:

/pdd budget N
/pdd settings
/pdd stop

prompt-driven-github · 2026-06-09T00:30:47Z

❌ PDD Execution Failed

Execution failed: PDD command failed with exit code 1.
Output: ate_limited credential=claude-oauth-v6

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=gemini-vertexai

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=openai-codex

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=anthropic-api-key

prompt-driven-github · 2026-06-09T01:02:09Z

🚀 Job Queued!

Job ID: 97P0GOf6lxDKVB7d7Kow
Triggered by: @DianaTao
Label: checkup

View Live Progress
Model: gemini-3.5-flash

Budget cap: none

You can add or change a cap by commenting:

/pdd budget N
/pdd settings
/pdd stop

prompt-driven-github · 2026-06-09T01:06:47Z

❌ PDD Execution Failed

Execution failed: PDD command failed with exit code 1.
Output:
[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=anthropic status=429 reason=interactive_preflight_rate_limited credential=claude-oauth-v6

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=openai-codex

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=anthropic-api-key

pdd-cli

Approved! Looks solid and all CI checks are green.

Rebases Block 1 onto main with review cleanup: finding_id on ApprovedPatch, duplicate record_choice rejection, merged contract tests from #1496, Hybrid #1434 cross-link, and a single architecture.json entry without unicode churn. Co-authored-by: Cursor <cursoragent@cursor.com>

prompt-driven-github · 2026-06-09T17:22:55Z

🚀 Job Queued!

Job ID: YskET1Qz5DgHmVxw2H0G
Triggered by: @DianaTao
Label: checkup

View Live Progress
Model: gemini-3.5-flash

Budget cap: none

You can add or change a cap by commenting:

/pdd budget N
/pdd settings
/pdd stop

prompt-driven-github · 2026-06-09T17:31:16Z

Step 7/8: Final Gate Report

PR: #1485
Issue: #1435
final-gate-status: failed
final-gate-stage: layer1
issue_aligned: unknown

Summary

Layer 1 PR checkup failed before Layer 2 review loop could run.

Machine Verdict

{
  "findings": [
    {
      "area": "layer1",
      "finding": "Layer 1 checkup failed before Layer 2: Aborting: 3 consecutive steps failed - agent providers unavailable",
      "location": "",
      "required_fix": "Resolve the Layer 1 checkup failure or push-guard refusal, then re-run the final gate.",
      "severity": "blocker",
      "status": "open"
    }
  ],
  "fresh_final_status": "missing",
  "full_suite_source": "github-checks",
  "issue_aligned": null,
  "issue_url": "https://github.com/promptdriven/pdd/issues/1435",
  "layer1_status": "failed",
  "layer2_status": "skipped",
  "pr_url": "https://github.com/promptdriven/pdd/pull/1485",
  "reason": "Aborting: 3 consecutive steps failed - agent providers unavailable",
  "reviewer_status": {},
  "schema": "pdd.checkup.final_gate.v1",
  "stage": "layer1",
  "status": "failed"
}

Issues Summary

Severity	Module	Description	Fixed
blocker	layer1	Layer 1 checkup failed before Layer 2: Aborting: 3 consecutive steps failed - agent providers unavailable	No

Drop architecture.json unicode re-encoding churn; append only the checkup_interactive_session module entry. Fix example approved_patches assertion after finding_id stamping. Add contract tests pinning stdlib-only protocol layer per #1434 Hybrid decision (no Pi/TTY/llm_invoke in #1435). Co-authored-by: Cursor <cursoragent@cursor.com>

prompt-driven-github · 2026-06-09T17:33:17Z

Step 7/8: Final Gate Report

PR: #1485
Issue: #1435
final-gate-status: failed
final-gate-stage: layer1
issue_aligned: unknown

Summary

Layer 1 PR checkup failed before Layer 2 review loop could run.

Machine Verdict

{
  "findings": [
    {
      "area": "layer1",
      "finding": "Layer 1 checkup failed before Layer 2: Aborting: 3 consecutive steps failed - agent providers unavailable",
      "location": "",
      "required_fix": "Resolve the Layer 1 checkup failure or push-guard refusal, then re-run the final gate.",
      "severity": "blocker",
      "status": "open"
    }
  ],
  "fresh_final_status": "missing",
  "full_suite_source": "github-checks",
  "issue_aligned": null,
  "issue_url": "https://github.com/promptdriven/pdd/issues/1435",
  "layer1_status": "failed",
  "layer2_status": "skipped",
  "pr_url": "https://github.com/promptdriven/pdd/pull/1485",
  "reason": "Aborting: 3 consecutive steps failed - agent providers unavailable",
  "reviewer_status": {},
  "schema": "pdd.checkup.final_gate.v1",
  "stage": "layer1",
  "status": "failed"
}

Issues Summary

Severity	Module	Description	Fixed
blocker	layer1	Layer 1 checkup failed before Layer 2: Aborting: 3 consecutive steps failed - agent providers unavailable	No

DianaTao · 2026-06-09T17:35:26Z

Diana-6/9

prompt-driven-github · 2026-06-09T17:35:33Z

@DianaTao Noted — your feedback will be incorporated at the next step of the run.

prompt-driven-github · 2026-06-09T18:15:47Z

Step 7/8: Verification & Final Report (Iteration 2)

Test Results After Fixes

Total: 45 tests
Passed: 45
Failed: 0
Previously failing, now passing: 0 (no failures in previous iteration)
New failures: 0

Build Status

Pass — all compile checks, import, example script, and JSON validation succeeded.

Command	Result
`python -m py_compile pdd/checkup_interactive_session.py`	✅ OK
`python -m py_compile tests/test_checkup_interactive_session.py`	✅ OK
`python -m py_compile context/checkup_interactive_session_example.py`	✅ OK
`python -c "import pdd.checkup_interactive_session"`	✅ OK
`python context/checkup_interactive_session_example.py`	✅ OK
`python -m json.tool architecture.json`	✅ valid JSON

Overall Status

All Issues Fixed

Verification Scope

Targeted (full suite not run). Tests run were scoped to <pr_changed_files>: tests/test_checkup_interactive_session.py (26 tests) and tests/test_evidence_manifest.py (19 tests, adjacent to changed docs/evidence_manifest.md). Full GitHub CI was not used as a gate.

Issue Alignment

issue_aligned: true — The PR delivers all deliverables requested in issue #1435: ApprovedPatch and RepairOption dataclasses, InteractiveRepairSession protocol, FakeInteractiveSession backend, 26 contract tests, session artifact schema docs, and evidence manifest cross-reference. No file writes occur in the session layer.

Issues Summary

Severity	Category	Module	Description	Fixed
low	artifact	docs/checkup_interactive_session.md	Line 14 references `docs/checkup_interactive_session_spike.md` which does not exist yet (forward ref to #1434 spike output). No runtime impact.	No (out-of-scope forward ref)

Checkup complete.

{
  "success": true,
  "message": "Verification scope: targeted — full suite not run. All 26 PR-scoped contract tests pass (tests/test_checkup_interactive_session.py), all 19 adjacent evidence manifest tests pass, example script runs cleanly, architecture.json is valid JSON. 1 low-severity dangling doc reference (docs/checkup_interactive_session_spike.md forward ref to #1434 spike) remains but is non-blocking. PR correctly implements issue #1435: ApprovedPatch, RepairOption, InteractiveRepairSession protocol, FakeInteractiveSession backend, session artifact schema docs, no filesystem writes in session layer.",
  "tech_stack": ["Python 3.12", "pytest", "dataclasses", "typing.Protocol"],
  "issue_aligned": true,
  "issues": [
    {
      "module": "docs/checkup_interactive_session.md",
      "file": "docs/checkup_interactive_session.md",
      "severity": "low",
      "category": "artifact",
      "description": "Line 14 references docs/checkup_interactive_session_spike.md which does not exist on disk. Forward reference to #1434 spike output. No runtime impact.",
      "fixed": false,
      "scope": "pr",
      "blocking": false,
      "fix_description": ""
    }
  ],
  "changed_files": [
    "architecture.json",
    "context/checkup_interactive_session_example.py",
    "docs/checkup_interactive_session.md",
    "docs/evidence_manifest.md",
    "pdd/checkup_interactive_session.py",
    "pdd/prompts/checkup_interactive_session_python.prompt",
    "tests/test_checkup_interactive_session.py"
  ]
}

PR Push Status

No changes to push.

prompt-driven-github · 2026-06-09T18:21:20Z

Step 7/8: Review Loop Final Report

PR: #1485
Issue: #1435
issue_aligned: false
active-reviewer: codex
reviewer-status: codex=findings gemini=fixer fresh-final=missing
fresh-final-review: missing
verified-head-sha: none
remote-pr-head-sha: none
test-scope: targeted
full-suite-source: github-checks
max-rounds-reached: false
max-cost-reached: false
max-duration-reached: false

Summary

Fixer gemini could not address codex's findings.

Verification scope: targeted with GitHub checks gate.

Per-Reviewer Status

Reviewer	Status
codex	findings
gemini	fixer
fresh-final	missing

Machine Verdict

{
  "active_reviewer": "codex",
  "findings": [
    {
      "area": "prompt",
      "evidence": "The prompt declares ApprovedPatch and RepairOption under module.functions with parseable constructor signatures, and architecture.json mirrors the same shape at architecture.json:10010. A direct local call to pdd.code_generator_main._verify_pdd_interface_signatures against the new prompt and pdd/checkup_interactive_session.py fails with: \"the prompt's <pdd-interface> declares function(s)/method(s) missing from the generated code: ApprovedPatch, RepairOption.\" This reproduces the pdd sync failure already seen in the PR history.",
      "finding": "The source prompt/architecture interface models dataclasses as functions, so PDD's conformance gate treats them as missing callable functions and blocks sync for this module.",
      "key": "blocker|pdd/prompts/checkup_interactive_session_python.prompt:8|the source prompt/architecture interface models dataclasses as functions, so pdd's conformance gate treats them as missing callable functions and blocks sync for this module.|move approvedpatch and repairoption out of module.functions into the repository's dataclasses/classes interface shape, or otherwise use a non-function interface representation that the conformance checker does not parse as required callables; then resync archit",
      "location": "pdd/prompts/checkup_interactive_session_python.prompt:8",
      "required_fix": "Move ApprovedPatch and RepairOption out of module.functions into the repository's dataclasses/classes interface shape, or otherwise use a non-function interface representation that the conformance checker does not parse as required callables; then resync architecture.json. Adding tests alone is insufficient because pdd sync still fails before generation can complete.",
      "reviewer": "codex",
      "round": "1",
      "severity": "blocker",
      "status": "open"
    },
    {
      "area": "api",
      "evidence": "record_choice() validates with `option not in presented`, which uses dataclass equality. A freshly constructed RepairOption with the same label, preview, and ApprovedPatch as a shown option is accepted even though that object was never presented. Probe: present `shown`, then call `record_choice('f', clone)` where `clone == shown` but `clone is not shown`; the call records the clone.",
      "finding": "FakeInteractiveSession does not enforce the presented-option contract by object identity, so unpresented equal clones can be recorded as choices.",
      "key": "medium|pdd/checkup_interactive_session.py:109|fakeinteractivesession does not enforce the presented-option contract by object identity, so unpresented equal clones can be recorded as choices.|validate against the actual presented option objects, for example `any(option is presented_option for presented_option in presented)`, and add a regression test for an equal-but-unpresented clone. this matters because the issue contract requires rejecting choices that were not previously presented for the f",
      "location": "pdd/checkup_interactive_session.py:109",
      "required_fix": "Validate against the actual presented option objects, for example `any(option is presented_option for presented_option in presented)`, and add a regression test for an equal-but-unpresented clone. This matters because the issue contract requires rejecting choices that were not previously presented for the finding.",
      "reviewer": "codex",
      "round": "1",
      "severity": "medium",
      "status": "open"
    },
    {
      "area": "file",
      "evidence": "The new documentation says `See docs/checkup_interactive_session_spike.md for the ownership boundary`, but that file is not present in this PR or the checked-out base tree (`test -f docs/checkup_interactive_session_spike.md` returns false).",
      "finding": "The PR introduces a local documentation reference to a missing file.",
      "key": "low|docs/checkup_interactive_session.md:14|the pr introduces a local documentation reference to a missing file.|either include the referenced spike document, change the reference to an existing issue/pr url, or rebase after the dependency that adds the file lands so the merged repository does not contain a broken docs path.",
      "location": "docs/checkup_interactive_session.md:14",
      "required_fix": "Either include the referenced spike document, change the reference to an existing issue/PR URL, or rebase after the dependency that adds the file lands so the merged repository does not contain a broken docs path.",
      "reviewer": "codex",
      "round": "1",
      "severity": "low",
      "status": "open"
    }
  ],
  "fresh_final_status": "missing",
  "full_suite_source": "github-checks",
  "github_ci_gate_used": true,
  "issue_aligned": false,
  "issue_url": "https://github.com/promptdriven/pdd/issues/1435",
  "max_cost_reached": false,
  "max_duration_reached": false,
  "max_rounds_reached": false,
  "pr_url": "https://github.com/promptdriven/pdd/pull/1485",
  "reason": "Fixer gemini could not address codex's findings.",
  "remote_pr_head_sha": "none",
  "reviewer_status": {
    "codex": "findings",
    "fresh-final": "missing",
    "gemini": "fixer"
  },
  "schema": "pdd.checkup.final_gate.v1",
  "stage": "review-loop",
  "status": "failed",
  "test_scope": "targeted",
  "verified_head_sha": "none"
}

Findings

Severity	Status	Location	Finding	Required fix	Reviewer
blocker	open	pdd/prompts/checkup_interactive_session_python.prompt:8	The source prompt/architecture interface models dataclasses as functions, so PDD's conformance gate treats them as missing callable functions and blocks sync for this module.	Move ApprovedPatch and RepairOption out of module.functions into the repository's dataclasses/classes interface shape, or otherwise use a non-function interface representation that the conformance checker does not parse as required callables; then resync architecture.json. Adding tests alone is insufficient because pdd sync still fails before generation can complete.	codex
medium	open	pdd/checkup_interactive_session.py:109	FakeInteractiveSession does not enforce the presented-option contract by object identity, so unpresented equal clones can be recorded as choices.	Validate against the actual presented option objects, for example `any(option is presented_option for presented_option in presented)`, and add a regression test for an equal-but-unpresented clone. This matters because the issue contract requires rejecting choices that were not previously presented for the finding.	codex
low	open	docs/checkup_interactive_session.md:14	The PR introduces a local documentation reference to a missing file.	Either include the referenced spike document, change the reference to an existing issue/PR URL, or rebase after the dependency that adds the file lands so the merged repository does not contain a broken docs path.	codex

Fixer Rationale

pdd/prompts/checkup_interactive_session_python.prompt:8: The source prompt/architecture interface models dataclasses as functions, so PDD's conformance gate treats them as missing callable functions and blocks sync for this module. (fixer=gemini fixer_summary='All agent providers failed: google: Authentication required. Please visit the URL to log in:'; verification=unverified)
pdd/checkup_interactive_session.py:109: FakeInteractiveSession does not enforce the presented-option contract by object identity, so unpresented equal clones can be recorded as choices. (fixer=gemini fixer_summary='All agent providers failed: google: Authentication required. Please visit the URL to log in:'; verification=unverified)
docs/checkup_interactive_session.md:14: The PR introduces a local documentation reference to a missing file. (fixer=gemini fixer_summary='All agent providers failed: google: Authentication required. Please visit the URL to log in:'; verification=unverified)

Fixes Attempted

round=1 fixer=gemini fixer_result=failed push_status=not_attempted local_sha=none pushed_sha=none changed_files=none verification=unverified summary=All agent providers failed: google: Authentication required. Please visit the URL to log in:

prompt-driven-github · 2026-06-09T18:21:25Z

❌ PDD Execution Failed

Execution failed: PDD command failed with exit code 1.
Output: er_summary='All agent providers failed: google: Authentication required. Please visit the URL to log in:'; verification=unverified)

Fixes Attempted

round=1 fixer=gemini fixer_result=failed push_status=not_attempted local_sha=none pushed_sha=none changed_files=none verification=unverified summary=All agent providers failed: google: Authentication required. Please visit the URL to log in: — verdict: not shippable (findings remain or verification is unverified).
Cost: $8.8677
Model: openai

Use the repo dataclasses key for ApprovedPatch and RepairOption so pdd sync conformance no longer treats @DataClass classes as missing functions. Add a regression test for _verify_pdd_interface_signatures. Co-authored-by: Cursor <cursoragent@cursor.com>

prompt-driven-github Bot mentioned this pull request Jun 8, 2026

feat(checkup): InteractiveRepairSession protocol for prompt repair #1435

Closed

5 tasks

DianaTao self-assigned this Jun 8, 2026

DianaTao added the pdd-test PDD: run tests label Jun 8, 2026

prompt-driven-github Bot mentioned this pull request Jun 8, 2026

Add tests for #1485 #1496

Open

4 tasks

DianaTao added pdd-checkup PDD: run a codebase checkup and removed pdd-test PDD: run tests labels Jun 9, 2026

DianaTao added pdd-gemini-flash PDD: use Gemini 3 Flash (fastest, preview) pdd-checkup PDD: run a codebase checkup and removed pdd-checkup PDD: run a codebase checkup labels Jun 9, 2026

pdd-cli approved these changes Jun 9, 2026

View reviewed changes

DianaTao force-pushed the change/issue-1435 branch from 918af0d to f99a904 Compare June 9, 2026 06:45

DianaTao added 2 commits June 8, 2026 23:47

Merge branch 'main' into change/issue-1435

e9d60b7

Merge branch 'main' into change/issue-1435

5a70b15

DianaTao added pdd-checkup PDD: run a codebase checkup and removed pdd-checkup PDD: run a codebase checkup labels Jun 9, 2026

DianaTao added pdd-opus and removed pdd-gemini-flash PDD: use Gemini 3 Flash (fastest, preview) labels Jun 9, 2026

gltanaka merged commit ed39620 into main Jun 11, 2026
10 checks passed

Conversation

prompt-driven-github Bot commented Jun 8, 2026 • edited by DianaTao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Review cleanup (this revision)

Test plan

Merge order

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 1: Duplicate Check

Search Performed

Findings

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 1/18: Search for duplicate test requests

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 2: Documentation Check

Project Structure

Components to Test

Test Infrastructure Found

API Configuration (if API testing)

External Service Contracts (for Mock Generation)

Key API Details (for test generation)

Notes

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 2/18: Review codebase to understand what to test

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 3: Analyze & Clarify

Summary

Test Scenarios Identified

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 3/18: Determine if enough info

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 4: Detect Test Type

Test Type Analysis

Environment Requirements

Test File Convention

Rationale

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 4/18: Identify test type

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 5: Create Test Plan

Test Plan Overview

Test Scenarios

Scenario 1: ApprovedPatch Dataclass Initialization and Coercions

Scenario 2: FakeInteractiveSession.seed() Behavior

Scenario 3: FakeInteractiveSession.present_finding() Behavior

Scenario 4: FakeInteractiveSession.ask() Queue Exhaustion

Scenario 5: Multi-Finding Session and approved_patches() Edge Cases

Scenario 6: InteractiveRepairSession Protocol Conformance

Test Configuration

Mocking Strategy

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 5/18: Create test plan

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 5b: Enhanced Test Plan

Contract Validation Tests (if applicable)

Accessibility Tests (if applicable)

Enhanced Plan Summary

Scenario 1: ApprovedPatch Dataclass Initialization and Coercions

Scenario 2: FakeInteractiveSession.seed() Behavior

Scenario 3: FakeInteractiveSession.present_finding() Behavior

Scenario 4: FakeInteractiveSession.ask() Queue Exhaustion

Scenario 5: Multi-Finding Session and approved_patches() Edge Cases

Scenario 6: Protocol Conformance

Uh oh!

prompt-driven-github Bot commented Jun 8, 2026

Step 5.5/18: Enhance plan

prompt-driven-github Bot commented Jun 8, 2026 •

edited by DianaTao

Loading

Scenario 1: `ApprovedPatch` Dataclass Initialization and Coercions

Scenario 2: `FakeInteractiveSession.seed()` Behavior

Scenario 3: `FakeInteractiveSession.present_finding()` Behavior

Scenario 4: `FakeInteractiveSession.ask()` Queue Exhaustion

Scenario 5: Multi-Finding Session and `approved_patches()` Edge Cases

Scenario 6: `InteractiveRepairSession` Protocol Conformance

Scenario 1: `ApprovedPatch` Dataclass Initialization and Coercions

Scenario 2: `FakeInteractiveSession.seed()` Behavior

Scenario 3: `FakeInteractiveSession.present_finding()` Behavior

Scenario 4: `FakeInteractiveSession.ask()` Queue Exhaustion

Scenario 5: Multi-Finding Session and `approved_patches()` Edge Cases

Scenario 1: `ApprovedPatch` Dataclass Coercions

Scenario 2: `FakeInteractiveSession.seed()` Behaviour

Scenario 3: `FakeInteractiveSession.present_finding()` Behaviour

Scenario 4: `FakeInteractiveSession.ask()` Queue Exhaustion

Scenario 5: Multi-Finding Session and `approved_patches()` Edge Cases