Skip to content

feat: add interactive repair session protocol (#1435)#1485

Merged
gltanaka merged 5 commits into
mainfrom
change/issue-1435
Jun 11, 2026
Merged

feat: add interactive repair session protocol (#1435)#1485
gltanaka merged 5 commits into
mainfrom
change/issue-1435

Conversation

@prompt-driven-github

@prompt-driven-github prompt-driven-github Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds Block 1 (#1435) for interactive prompt repair: the engine-agnostic InteractiveRepairSession protocol, typed RepairOption and ApprovedPatch dataclasses, and a deterministic FakeInteractiveSession for contract tests.

Aligned with the Hybrid decision in #1434 / PR #1497: this module is the shared Python contract; TTY menus and --apply gating land in #1436, patch application in #1437.

Closes #1435

Changes

Area Files
Protocol + fake backend pdd/checkup_interactive_session.py
Prompt pdd/prompts/checkup_interactive_session_python.prompt
Contract tests (23) tests/test_checkup_interactive_session.py — includes merged coverage from #1496
Docs docs/checkup_interactive_session.md (Hybrid #1434 cross-link), docs/evidence_manifest.md
Example context/checkup_interactive_session_example.py
Registry architecture.json — single module entry only (no unicode normalization churn)

Review cleanup (this revision)

Test plan

Merge order

Merge #1434 (PR #1497) first, then this PR.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

🚀 Job Queued!

Job ID: ij43YnpqIYWfGsVl7hb8
Triggered by: @DianaTao
Label: test

View Live Progress

Budget cap: none

You can add or change a cap by commenting:

/pdd budget N
/pdd settings
/pdd stop

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 1: Duplicate Check

Status: No duplicates found

Search Performed

  • Searched for: InteractiveRepairSession, checkup_interactive_session, FakeInteractiveSession RepairOption ApprovedPatch, interactive repair session test, test checkup_interactive_session, test 1435
  • Issues reviewed: ~20 issues across open and closed states
  • Filtered specifically for pdd-test label matches

Findings

No existing test-generation issues were found targeting the checkup_interactive_session module, the InteractiveRepairSession protocol, or the FakeInteractiveSession fake backend.

Related feature issues exist (#1435, #1423, #1434, #1436, #1437), but these are implementation/feature issues — none are test generation requests for this module. The only pdd-test labeled issues found cover unrelated functionality (PR #1489 user story generation in #1492, and pdd checkup negative acceptance criteria in #1493).

This is a new test request for the checkup_interactive_session.py module introduced in the change tracked by #1435.


Proceeding to Step 2: Documentation Check

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 1/18: Search for duplicate test requests

Step 1 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 2: Documentation Check

Status: Research Complete

Project Structure

  • Frontend type: Python CLI / library (no frontend — pure Python package)
  • Test directory: tests/
  • Existing test framework: pytest (with pytest-mock via conftest mocker fixture)

Components to Test

  1. ApprovedPatch dataclass — pdd/checkup_interactive_session.py
  2. RepairOption dataclass — pdd/checkup_interactive_session.py
  3. InteractiveRepairSession Protocol — pdd/checkup_interactive_session.py
  4. FakeInteractiveSession concrete class — pdd/checkup_interactive_session.py
  5. NON_APPROVING_PATCH_KINDS constant — pdd/checkup_interactive_session.py

Test Infrastructure Found

  • Config file: tests/conftest.py — HOME isolation via tempdir, mocker fixture, dotenv loading
  • Existing fixtures: mocker (pytest-mock compatible), tmp_path (standard pytest)
  • Test patterns:
    • Pure unit tests, no fixtures required for this module
    • Helper factory functions (_patch(), _option()) to create test objects
    • `pytest.raises(ValueError, match="...")" for exception tests
    • Direct equality assertions: assert x == y
    • Type membership: assert all(isinstance(...) for ...)
    • Deepcopy invariant tests (mutate one copy, re-fetch, verify independence)

API Configuration (if API testing)

  • Base URL source: N/A — no HTTP endpoints involved
  • Authentication: N/A
  • API docs: docs/checkup_interactive_session.md — session API contract, R1–R4 rules

External Service Contracts (for Mock Generation)

Service Function/Method to Mock Input Types Return Shape (from source code)
N/A — module uses standard library only

Source: pdd/checkup_interactive_session.py lines 1–124 — no third-party imports

Key API Details (for test generation)

ApprovedPatch dataclass (pdd/checkup_interactive_session.py):

@dataclass
class ApprovedPatch:
    kind: str
    target: Path          # __post_init__ coerces to Path
    anchor: dict[str, Any]  # __post_init__ copies to dict
    replacement: str

RepairOption dataclass:

@dataclass
class RepairOption:
    label: str
    preview: str
    patch: ApprovedPatch  # non-optional (R1 contract rule)

FakeInteractiveSession constructor:

FakeInteractiveSession(
    options_by_finding: Mapping[str, Sequence[RepairOption]] | None = None,
    answers: Iterable[str] | None = None,
)

Observable state: report, options_by_finding, presented_options, recorded_choices, qa_transcript_summary, _answers (deque)

Contract rules from docs:

  • R1: Every RepairOption.patch MUST be a non-optional ApprovedPatch
  • R2: record_choice() MUST raise ValueError if option not previously presented
  • R3: approved_patches() MUST return only patches whose kind is NOT in NON_APPROVING_PATCH_KINDS = frozenset({"skip", "custom_no_patch", "no_patch"})
  • R4: Session layer MUST NOT write files, apply patches, or persist artifacts

Existing test coverage (tests/test_checkup_interactive_session.py, 6 tests, 104 lines):

  1. test_fake_session_presents_seeded_options_and_records_choice — basic present/record flow
  2. test_fake_session_rejects_unpresented_choices — ValueError on unpresented and mismatched options
  3. test_approved_patches_returns_only_typed_approving_patches — non-approving kinds excluded
  4. test_approved_patches_returns_a_fresh_copy — deepcopy semantics
  5. test_scripted_answers_are_consumed_and_summarized — ask() deque drain + QA transcript
  6. test_seed_can_import_mapping_style_findings — seed() with findings list containing dicts

Notes

  • The module is stdlib-only (no mocking of external services needed)
  • FakeInteractiveSession IS the fake backend — tests use it directly
  • InteractiveRepairSession is a typing.Protocol; structural conformance can be asserted with isinstance(session, InteractiveRepairSession) if runtime-checkable, or confirmed by testing all 5 method signatures
  • seed() supports two import paths: (a) explicit options_by_finding at construction time, (b) mapping-style findings list passed at seed() time
  • The context/checkup_interactive_session_example.py file demonstrates the golden-path workflow end-to-end
  • Additional coverage areas not yet tested: ApprovedPatch.__post_init__ Path coercion, anchor dict copy independence, empty options_by_finding / empty answers edge cases, multiple findings in one session, approved_patches() with zero recorded choices

Proceeding to Step 3: Analyze & Clarify

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 2/18: Review codebase to understand what to test

Step 2 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 3: Analyze & Clarify

Status: Proceed

Summary

We have sufficient information to create comprehensive tests for pdd/checkup_interactive_session.py. The module is stdlib-only with 4 well-documented public types (ApprovedPatch, RepairOption, InteractiveRepairSession, FakeInteractiveSession). The API contract, side-effect boundaries, and fake backend behavior are clearly defined in the docs. The 6 existing tests provide a solid baseline with identified gaps that can now be filled.

No external services, credentials, or environment setup are needed — all tests are pure unit tests.

Test Scenarios Identified

  1. ApprovedPatch.__post_init__ coerces target from a plain string to a Path
  2. ApprovedPatch.__post_init__ copies the anchor mapping so mutations to the input don't affect the dataclass
  3. approved_patches() returns an empty list when no choices have been recorded
  4. Multiple findings can be presented and choices recorded in a single session, producing multiple approved patches
  5. seed() with a non-mapping report stores the report without raising
  6. seed() with findings that use the finding_id key (vs id) correctly imports options
  7. present_finding() returns an independent list each call (not the same list object)
  8. ask() with an exhausted answers queue returns an empty string
  9. record_choice() after re-presenting a finding (overwriting presented_options) allows a new valid choice
  10. FakeInteractiveSession satisfies the InteractiveRepairSession structural protocol
  11. approved_patches() excludes all three non-approving kinds (skip, custom_no_patch, no_patch)
  12. approved_patches() returns deep copies that are independent from internal state

Proceeding to Step 4: Detect Frontend

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 3/18: Determine if enough info

Step 3 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 4: Detect Test Type

Status: Test Type Detected

Test Type Analysis

  • Type: API
  • Framework/Stack: Python 3.12+, stdlib-only module, pytest
  • Test Framework: pytest

Environment Requirements

  • Base URL/Command: N/A
  • Environment Variables: None required
  • Authentication: Not required

Test File Convention

  • Directory: tests/
  • Naming: test_*.py

Rationale

The checkup_interactive_session module is a stdlib-only Python library (pdd/checkup_interactive_session.py) providing ApprovedPatch, RepairOption, InteractiveRepairSession (Protocol), and FakeInteractiveSession. It has no UI, no CLI entry points, and no HTTP endpoints — all functionality is exercised through direct Python calls. The codebase already uses pytest for all unit tests, and the existing tests/test_checkup_interactive_session.py follows the test_*.py naming convention with no fixtures or external dependencies. pytest is the clear choice: zero additional dependencies, already installed, and directly exercises the module's public API.


Proceeding to Step 5: Create Test Plan

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 4/18: Identify test type

Step 4 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 5: Create Test Plan

Status: Plan Ready

Test Plan Overview

  • Total Test Cases: 14 (12 new + 2 that extend existing coverage)
  • Framework: pytest
  • Estimated Coverage: All 4 public types (ApprovedPatch, RepairOption, InteractiveRepairSession, FakeInteractiveSession), all public methods, all contract rules R1–R4 from the documentation, and all three non-approving patch kinds

Test Scenarios

Scenario 1: ApprovedPatch Dataclass Initialization and Coercions

  • Description: __post_init__ ensures target is always a Path and anchor is an independent dict copy, regardless of the input types passed to the constructor.
  • Test Cases:
    1. Constructing ApprovedPatch with a plain str target coerces it to Path
    2. Mutating the original anchor dict after construction does not affect the stored anchor (copy independence)

Scenario 2: FakeInteractiveSession.seed() Behavior

  • Description: seed() must accept any value without raising, and when the report is a mapping with findings, it should import typed RepairOption objects keyed by both id and finding_id fields.
  • Test Cases:
    1. seed() with a non-mapping report (e.g., a string or list) stores the report without raising
    2. seed() with findings using the finding_id key (vs id) correctly imports options into options_by_finding
    3. seed() does not overwrite pre-existing options loaded at construction time (uses setdefault)

Scenario 3: FakeInteractiveSession.present_finding() Behavior

  • Description: present_finding() must return a fresh independent list each call (not aliased to internal state), and return an empty list for an unknown finding ID.
  • Test Cases:
    1. Returns a new list object each call — modifying the returned list does not affect subsequent calls
    2. Returns an empty list for an unknown finding_id

Scenario 4: FakeInteractiveSession.ask() Queue Exhaustion

  • Description: Consuming all scripted answers and calling ask() again must return an empty string rather than raising.
  • Test Cases:
    1. ask() when the answers deque is empty returns "" (Note: partially covered by existing test; verify explicit empty-session case)

Scenario 5: Multi-Finding Session and approved_patches() Edge Cases

  • Description: A session may span multiple findings; approved_patches() must return one patch per approving choice across all findings, exclude all three non-approving kinds, return an empty list before any choices are recorded, and return deep-copied patches that are independent of internal state.
  • Test Cases:
    1. approved_patches() returns an empty list when no choices have been recorded
    2. Presenting and recording choices for two different findings produces two approved patches in order
    3. record_choice() after re-presenting a finding (overwriting presented_options) accepts the re-presented option as valid
    4. approved_patches() excludes patches with kind no_patch (in addition to skip and custom_no_patch already covered)
    5. The returned patches are independent deep copies — mutating a returned patch does not affect the internal recorded_choices state

Scenario 6: InteractiveRepairSession Protocol Conformance

  • Description: FakeInteractiveSession must structurally satisfy InteractiveRepairSession so any orchestration code typed against the protocol accepts it without casting.
  • Test Cases:
    1. isinstance(FakeInteractiveSession(), InteractiveRepairSession) (or runtime_checkable equivalent) confirms structural conformance

Test Configuration

  • Setup required: None — stdlib-only module, no fixtures or environment variables needed
  • Fixtures needed: _patch() and _option() factory helpers already exist in the test file
  • Cleanup: None — all state is in-memory

Mocking Strategy

Dependency Function to Mock Mock Return Value Error Case
(none) N/A — stdlib only N/A N/A

No external dependencies to mock. The module imports only from the standard library (collections, copy, dataclasses, pathlib, typing).

Contract source files reviewed:

  • pdd/checkup_interactive_session.py — all types and methods
  • docs/checkup_interactive_session.md — R1–R4 contract rules and no-write boundary
  • tests/test_checkup_interactive_session.py — 6 existing tests (baseline coverage)

Proceeding to Step 5b: Enhance Plan

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 5/18: Create test plan

Step 5 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 5b: Enhanced Test Plan

Status: Plan Enhanced

Contract Validation Tests (if applicable)

No OpenAPI/Swagger spec was identified in Step 2. The checkup_interactive_session module is stdlib-only with no HTTP endpoints or REST API surface — contract validation via JSON schema is not applicable.

Contract test cases added: 0

Accessibility Tests (if applicable)

Step 4 detected TEST_TYPE: api (pure Python unit tests), not web. There are no UI pages or browser routes to audit — WCAG accessibility testing via @axe-core/playwright is not applicable.

Accessibility test cases added: 0

Enhanced Plan Summary

  • Original test cases: 14 (from Step 5)
  • Contract test cases: 0 (no OpenAPI spec — stdlib-only module)
  • Accessibility test cases: 0 (TEST_TYPE is api, not web)
  • Total enhanced plan cases: 14

The original 14-case plan from Step 5 stands as the complete test plan:

Scenario 1: ApprovedPatch Dataclass Initialization and Coercions

  1. Constructing ApprovedPatch with a plain str target coerces it to Path
  2. Mutating the original anchor dict after construction does not affect the stored anchor (copy independence)

Scenario 2: FakeInteractiveSession.seed() Behavior

  1. seed() with a non-mapping report stores the report without raising
  2. seed() with findings using finding_id key correctly imports options
  3. seed() does not overwrite pre-existing options (uses setdefault)

Scenario 3: FakeInteractiveSession.present_finding() Behavior

  1. Returns a new list object each call — modifying returned list does not affect subsequent calls
  2. Returns an empty list for an unknown finding_id

Scenario 4: FakeInteractiveSession.ask() Queue Exhaustion

  1. ask() when answers deque is empty returns ""

Scenario 5: Multi-Finding Session and approved_patches() Edge Cases

  1. approved_patches() returns empty list when no choices recorded
  2. Two findings produce two approved patches in order
  3. record_choice() after re-presenting a finding accepts the re-presented option
  4. approved_patches() excludes no_patch kind
  5. Returned patches are independent deep copies

Scenario 6: Protocol Conformance

  1. FakeInteractiveSession satisfies InteractiveRepairSession structural protocol

Proceeding to Step 6: Assess Automated Test Coverage

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 5.5/18: Enhance plan

Step 5.5 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 12: Generated Tests

Test Files Modified

  • tests/test_checkup_interactive_session.py

Test Summary

  • Total Tests: 20 (6 existing + 14 new)
  • Test Suites: 1
  • Framework: pytest

New Test Cases Added

Scenario 1: ApprovedPatch Dataclass Coercions

  • test_approved_patch_coerces_string_target_to_path__post_init__ converts a plain str target to Path
  • test_approved_patch_anchor_is_copy_independent_of_input — mutating the input dict after construction does not affect the stored anchor

Scenario 2: FakeInteractiveSession.seed() Behaviour

  • test_seed_non_mapping_report_stores_report_without_raising — non-mapping reports (str, None, list) are stored without raising
  • test_seed_imports_findings_using_finding_id_key — findings keyed with finding_id (not id) are imported correctly
  • test_seed_does_not_overwrite_preexisting_optionssetdefault preserves options set before seed()

Scenario 3: FakeInteractiveSession.present_finding() Behaviour

  • test_present_finding_returns_independent_list_each_call — mutating one returned list does not affect the next call
  • test_present_finding_returns_empty_list_for_unknown_finding — unknown finding_id yields []

Scenario 4: FakeInteractiveSession.ask() Queue Exhaustion

  • test_ask_returns_empty_string_when_answers_queue_is_empty — empty deque returns "" and records it in the transcript

Scenario 5: Multi-Finding Session and approved_patches() Edge Cases

  • test_approved_patches_returns_empty_list_when_no_choices_recorded — zero choices → empty list
  • test_multi_finding_session_produces_ordered_approved_patches — two findings → two patches in order
  • test_record_choice_after_representing_finding_accepts_newly_presented_option — re-presenting overwrites the valid set; old options are rejected
  • test_approved_patches_excludes_no_patch_kind — the no_patch kind (the third non-approving kind) is excluded
  • test_approved_patches_deep_copy_is_independent_of_recorded_choice — mutating a returned copy does not alias the recorded option data

Scenario 6: Protocol Conformance

  • test_fake_session_satisfies_interactive_repair_session_protocolFakeInteractiveSession is assignable to InteractiveRepairSession and all protocol methods are callable

Test Code Overview

# Example: ApprovedPatch coercions
def test_approved_patch_coerces_string_target_to_path() -> None:
    patch = ApprovedPatch(kind='vocab_definition', target='prompts/foo.prompt',
                          anchor={}, replacement='x')
    assert isinstance(patch.target, Path)

# Example: Protocol conformance
def test_fake_session_satisfies_interactive_repair_session_protocol() -> None:
    session: InteractiveRepairSession = FakeInteractiveSession()
    assert callable(session.seed)
    assert callable(session.present_finding)
    assert callable(session.ask)
    assert callable(session.record_choice)
    assert callable(session.approved_patches)

Running the Tests

pytest -vv tests/test_checkup_interactive_session.py

All 20 tests pass (6 original + 14 new).


Proceeding to Step 13: Run Tests

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 12/18: Generate tests

Step 12 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 13: Run Tests

Status: All Tests Passing

Test Results

  • Passed: 20
  • Failed: 0
  • Skipped: 0

Execution Details

  • Duration: 0.34s
  • Framework: pytest 9.0.3, Python 3.12.3
  • CWD: worktree root (/tmp/pdd_job_ij43YnpqIYWfGsVl7hb8_wyt4aqef/.pdd/worktrees/test-issue-1485)
  • Multi-cwd check: .pddrc is in the worktree root — same as CI cwd, second run skipped

Summary

All 20 generated tests pass successfully (6 existing + 14 new).

Test Status
test_fake_session_presents_seeded_options_and_records_choice PASSED
test_fake_session_rejects_unpresented_choices PASSED
test_approved_patches_returns_only_typed_approving_patches PASSED
test_approved_patches_returns_a_fresh_copy PASSED
test_scripted_answers_are_consumed_and_summarized PASSED
test_seed_can_import_mapping_style_findings PASSED
test_approved_patch_coerces_string_target_to_path PASSED
test_approved_patch_anchor_is_copy_independent_of_input PASSED
test_seed_non_mapping_report_stores_report_without_raising PASSED
test_seed_imports_findings_using_finding_id_key PASSED
test_seed_does_not_overwrite_preexisting_options PASSED
test_present_finding_returns_independent_list_each_call PASSED
test_present_finding_returns_empty_list_for_unknown_finding PASSED
test_ask_returns_empty_string_when_answers_queue_is_empty PASSED
test_approved_patches_returns_empty_list_when_no_choices_recorded PASSED
test_multi_finding_session_produces_ordered_approved_patches PASSED
test_record_choice_after_representing_finding_accepts_newly_presented_option PASSED
test_approved_patches_excludes_no_patch_kind PASSED
test_approved_patches_deep_copy_is_independent_of_recorded_choice PASSED
test_fake_session_satisfies_interactive_repair_session_protocol PASSED

Proceeding to Step 15: Validate Tests Against Plan

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 13/18: Execute generated tests

Step 13 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 14: Fix & Iterate

Status: All Tests Passing

Fixes Applied

No fixes were needed. All 20 tests were already passing from Step 13 and continue to pass on re-verification.

Final Test Results

  • Passed: 20
  • Failed: 0
  • Fix Iterations: 0

Proceeding to Step 15: Validate Tests Against Plan

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 14/18: Fix failing tests

Step 14 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 15: Test Plan Validation

Status: Validation Complete

Coverage Analysis

  • Planned cases: 14
  • Implemented cases: 14
  • Coverage ratio: 100%

Implemented Test Cases

# Planned Case Category Status
1 test_approved_patch_coerces_string_target_to_path behavioral — ApprovedPatch coercions Implemented
2 test_approved_patch_anchor_is_copy_independent_of_input behavioral — ApprovedPatch coercions Implemented
3 test_seed_non_mapping_report_stores_report_without_raising behavioral — seed() behaviour Implemented
4 test_seed_imports_findings_using_finding_id_key behavioral — seed() behaviour Implemented
5 test_seed_does_not_overwrite_preexisting_options behavioral — seed() behaviour Implemented
6 test_present_finding_returns_independent_list_each_call behavioral — present_finding() behaviour Implemented
7 test_present_finding_returns_empty_list_for_unknown_finding behavioral — present_finding() behaviour Implemented
8 test_ask_returns_empty_string_when_answers_queue_is_empty behavioral — ask() queue exhaustion Implemented
9 test_approved_patches_returns_empty_list_when_no_choices_recorded behavioral — approved_patches() edge cases Implemented
10 test_multi_finding_session_produces_ordered_approved_patches behavioral — approved_patches() edge cases Implemented
11 test_record_choice_after_representing_finding_accepts_newly_presented_option behavioral — approved_patches() edge cases Implemented
12 test_approved_patches_excludes_no_patch_kind behavioral — approved_patches() edge cases Implemented
13 test_approved_patches_deep_copy_is_independent_of_recorded_choice behavioral — approved_patches() edge cases Implemented
14 test_fake_session_satisfies_interactive_repair_session_protocol behavioral — Protocol conformance Implemented

Missing Cases

None — all 14 planned cases are implemented.

Notes

  • Contract test cases: 0 (no OpenAPI spec — stdlib-only module, not applicable)
  • Accessibility test cases: 0 (TEST_TYPE is api, not web — not applicable)
  • Total tests in file: 20 (6 pre-existing + 14 new from plan)
  • No new files needed — coverage is 100%

Proceeding to Step 16: Run Newly Generated Tests

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 15/18: Validate tests against plan

Step 15 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github prompt-driven-github Bot mentioned this pull request Jun 8, 2026
4 tasks
@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 17: Draft PR Created

Pull Request

PR #1496: Add tests for #1485

Branch

test/issue-1485

What's Included

  • Test files at:
    • tests/test_checkup_interactive_session.py
  • Commits: 1

Test Summary

  • Total Tests: 20 (6 existing + 14 new)
  • Passing: 20
  • Framework: pytest
  • Test Plan Coverage: 14/14 planned cases implemented (100%)

Next Steps for Reviewers

  1. Review the generated tests for completeness
  2. Run tests locally: pytest -vv tests/test_checkup_interactive_session.py
  3. Adjust test selectors/assertions if needed
  4. Mark the PR as ready for review

Running the Tests

pytest -vv tests/test_checkup_interactive_session.py

Test generation complete. A draft PR with the tests has been created and linked to this issue.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 17/18: Create draft PR

Step 17 completed; no <step_report> block returned by agent. Raw output retained in workflow state.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

PDD Execution Successful

Command: test
Duration: 14.0 min
Cost: $0.1399 infra (LLM billed to your API keys)
Pull Request: #1496

@DianaTao DianaTao added pdd-checkup PDD: run a codebase checkup and removed pdd-test PDD: run tests labels Jun 9, 2026
@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

🚀 Job Queued!

Job ID: RE1yWVvM9HwKxOzf2PXA
Triggered by: @DianaTao
Label: checkup

View Live Progress

Budget cap: none

You can add or change a cap by commenting:

/pdd budget N
/pdd settings
/pdd stop

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

❌ PDD Execution Failed

Execution failed: PDD command failed with exit code 1.
Output: ate_limited credential=claude-oauth-v6

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=gemini-vertexai

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=openai-codex

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=anthropic-api-key

@DianaTao DianaTao added pdd-gemini-flash PDD: use Gemini 3 Flash (fastest, preview) pdd-checkup PDD: run a codebase checkup and removed pdd-checkup PDD: run a codebase checkup labels Jun 9, 2026
@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

🚀 Job Queued!

Job ID: 97P0GOf6lxDKVB7d7Kow
Triggered by: @DianaTao
Label: checkup

View Live Progress
Model: gemini-3.5-flash

Budget cap: none

You can add or change a cap by commenting:

/pdd budget N
/pdd settings
/pdd stop

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

❌ PDD Execution Failed

Execution failed: PDD command failed with exit code 1.
Output:
[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=anthropic status=429 reason=interactive_preflight_rate_limited credential=claude-oauth-v6

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=openai-codex

[PROVIDER_FAILURE_SUMMARY] PDD_PROVIDER_FAILURE provider=codex status=lock_busy reason=codex_auth_lock_contention scope=lanes credential=anthropic-api-key

@pdd-cli pdd-cli left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Looks solid and all CI checks are green.

Rebases Block 1 onto main with review cleanup: finding_id on
ApprovedPatch, duplicate record_choice rejection, merged contract
tests from #1496, Hybrid #1434 cross-link, and a single
architecture.json entry without unicode churn.

Co-authored-by: Cursor <cursoragent@cursor.com>
@DianaTao DianaTao force-pushed the change/issue-1435 branch from 918af0d to f99a904 Compare June 9, 2026 06:45
@DianaTao DianaTao added pdd-checkup PDD: run a codebase checkup and removed pdd-checkup PDD: run a codebase checkup labels Jun 9, 2026
@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

🚀 Job Queued!

Job ID: YskET1Qz5DgHmVxw2H0G
Triggered by: @DianaTao
Label: checkup

View Live Progress
Model: gemini-3.5-flash

Budget cap: none

You can add or change a cap by commenting:

/pdd budget N
/pdd settings
/pdd stop

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 7/8: Final Gate Report

PR: #1485
Issue: #1435
final-gate-status: failed
final-gate-stage: layer1
issue_aligned: unknown

Summary

Layer 1 PR checkup failed before Layer 2 review loop could run.

Machine Verdict

{
  "findings": [
    {
      "area": "layer1",
      "finding": "Layer 1 checkup failed before Layer 2: Aborting: 3 consecutive steps failed - agent providers unavailable",
      "location": "",
      "required_fix": "Resolve the Layer 1 checkup failure or push-guard refusal, then re-run the final gate.",
      "severity": "blocker",
      "status": "open"
    }
  ],
  "fresh_final_status": "missing",
  "full_suite_source": "github-checks",
  "issue_aligned": null,
  "issue_url": "https://github.com/promptdriven/pdd/issues/1435",
  "layer1_status": "failed",
  "layer2_status": "skipped",
  "pr_url": "https://github.com/promptdriven/pdd/pull/1485",
  "reason": "Aborting: 3 consecutive steps failed - agent providers unavailable",
  "reviewer_status": {},
  "schema": "pdd.checkup.final_gate.v1",
  "stage": "layer1",
  "status": "failed"
}

Issues Summary

Severity Module Description Fixed
blocker layer1 Layer 1 checkup failed before Layer 2: Aborting: 3 consecutive steps failed - agent providers unavailable No

Drop architecture.json unicode re-encoding churn; append only the
checkup_interactive_session module entry. Fix example approved_patches
assertion after finding_id stamping. Add contract tests pinning stdlib-only
protocol layer per #1434 Hybrid decision (no Pi/TTY/llm_invoke in #1435).

Co-authored-by: Cursor <cursoragent@cursor.com>
@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 7/8: Final Gate Report

PR: #1485
Issue: #1435
final-gate-status: failed
final-gate-stage: layer1
issue_aligned: unknown

Summary

Layer 1 PR checkup failed before Layer 2 review loop could run.

Machine Verdict

{
  "findings": [
    {
      "area": "layer1",
      "finding": "Layer 1 checkup failed before Layer 2: Aborting: 3 consecutive steps failed - agent providers unavailable",
      "location": "",
      "required_fix": "Resolve the Layer 1 checkup failure or push-guard refusal, then re-run the final gate.",
      "severity": "blocker",
      "status": "open"
    }
  ],
  "fresh_final_status": "missing",
  "full_suite_source": "github-checks",
  "issue_aligned": null,
  "issue_url": "https://github.com/promptdriven/pdd/issues/1435",
  "layer1_status": "failed",
  "layer2_status": "skipped",
  "pr_url": "https://github.com/promptdriven/pdd/pull/1485",
  "reason": "Aborting: 3 consecutive steps failed - agent providers unavailable",
  "reviewer_status": {},
  "schema": "pdd.checkup.final_gate.v1",
  "stage": "layer1",
  "status": "failed"
}

Issues Summary

Severity Module Description Fixed
blocker layer1 Layer 1 checkup failed before Layer 2: Aborting: 3 consecutive steps failed - agent providers unavailable No

@DianaTao

DianaTao commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Diana-6/9

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

@DianaTao Noted — your feedback will be incorporated at the next step of the run.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 7/8: Verification & Final Report (Iteration 2)

Test Results After Fixes

  • Total: 45 tests
  • Passed: 45
  • Failed: 0
  • Previously failing, now passing: 0 (no failures in previous iteration)
  • New failures: 0

Build Status

Pass — all compile checks, import, example script, and JSON validation succeeded.

Command Result
python -m py_compile pdd/checkup_interactive_session.py ✅ OK
python -m py_compile tests/test_checkup_interactive_session.py ✅ OK
python -m py_compile context/checkup_interactive_session_example.py ✅ OK
python -c "import pdd.checkup_interactive_session" ✅ OK
python context/checkup_interactive_session_example.py ✅ OK
python -m json.tool architecture.json ✅ valid JSON

Overall Status

All Issues Fixed

Verification Scope

Targeted (full suite not run). Tests run were scoped to <pr_changed_files>: tests/test_checkup_interactive_session.py (26 tests) and tests/test_evidence_manifest.py (19 tests, adjacent to changed docs/evidence_manifest.md). Full GitHub CI was not used as a gate.

Issue Alignment

issue_aligned: true — The PR delivers all deliverables requested in issue #1435: ApprovedPatch and RepairOption dataclasses, InteractiveRepairSession protocol, FakeInteractiveSession backend, 26 contract tests, session artifact schema docs, and evidence manifest cross-reference. No file writes occur in the session layer.

Issues Summary

Severity Category Module Description Fixed
low artifact docs/checkup_interactive_session.md Line 14 references docs/checkup_interactive_session_spike.md which does not exist yet (forward ref to #1434 spike output). No runtime impact. No (out-of-scope forward ref)

Checkup complete.

{
  "success": true,
  "message": "Verification scope: targeted — full suite not run. All 26 PR-scoped contract tests pass (tests/test_checkup_interactive_session.py), all 19 adjacent evidence manifest tests pass, example script runs cleanly, architecture.json is valid JSON. 1 low-severity dangling doc reference (docs/checkup_interactive_session_spike.md forward ref to #1434 spike) remains but is non-blocking. PR correctly implements issue #1435: ApprovedPatch, RepairOption, InteractiveRepairSession protocol, FakeInteractiveSession backend, session artifact schema docs, no filesystem writes in session layer.",
  "tech_stack": ["Python 3.12", "pytest", "dataclasses", "typing.Protocol"],
  "issue_aligned": true,
  "issues": [
    {
      "module": "docs/checkup_interactive_session.md",
      "file": "docs/checkup_interactive_session.md",
      "severity": "low",
      "category": "artifact",
      "description": "Line 14 references docs/checkup_interactive_session_spike.md which does not exist on disk. Forward reference to #1434 spike output. No runtime impact.",
      "fixed": false,
      "scope": "pr",
      "blocking": false,
      "fix_description": ""
    }
  ],
  "changed_files": [
    "architecture.json",
    "context/checkup_interactive_session_example.py",
    "docs/checkup_interactive_session.md",
    "docs/evidence_manifest.md",
    "pdd/checkup_interactive_session.py",
    "pdd/prompts/checkup_interactive_session_python.prompt",
    "tests/test_checkup_interactive_session.py"
  ]
}

PR Push Status

No changes to push.

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

Step 7/8: Review Loop Final Report

PR: #1485
Issue: #1435
issue_aligned: false
active-reviewer: codex
reviewer-status: codex=findings gemini=fixer fresh-final=missing
fresh-final-review: missing
verified-head-sha: none
remote-pr-head-sha: none
test-scope: targeted
full-suite-source: github-checks
max-rounds-reached: false
max-cost-reached: false
max-duration-reached: false

Summary

Fixer gemini could not address codex's findings.

Verification scope: targeted with GitHub checks gate.

Per-Reviewer Status

Reviewer Status
codex findings
gemini fixer
fresh-final missing

Machine Verdict

{
  "active_reviewer": "codex",
  "findings": [
    {
      "area": "prompt",
      "evidence": "The prompt declares ApprovedPatch and RepairOption under module.functions with parseable constructor signatures, and architecture.json mirrors the same shape at architecture.json:10010. A direct local call to pdd.code_generator_main._verify_pdd_interface_signatures against the new prompt and pdd/checkup_interactive_session.py fails with: \"the prompt's <pdd-interface> declares function(s)/method(s) missing from the generated code: ApprovedPatch, RepairOption.\" This reproduces the pdd sync failure already seen in the PR history.",
      "finding": "The source prompt/architecture interface models dataclasses as functions, so PDD's conformance gate treats them as missing callable functions and blocks sync for this module.",
      "key": "blocker|pdd/prompts/checkup_interactive_session_python.prompt:8|the source prompt/architecture interface models dataclasses as functions, so pdd's conformance gate treats them as missing callable functions and blocks sync for this module.|move approvedpatch and repairoption out of module.functions into the repository's dataclasses/classes interface shape, or otherwise use a non-function interface representation that the conformance checker does not parse as required callables; then resync archit",
      "location": "pdd/prompts/checkup_interactive_session_python.prompt:8",
      "required_fix": "Move ApprovedPatch and RepairOption out of module.functions into the repository's dataclasses/classes interface shape, or otherwise use a non-function interface representation that the conformance checker does not parse as required callables; then resync architecture.json. Adding tests alone is insufficient because pdd sync still fails before generation can complete.",
      "reviewer": "codex",
      "round": "1",
      "severity": "blocker",
      "status": "open"
    },
    {
      "area": "api",
      "evidence": "record_choice() validates with `option not in presented`, which uses dataclass equality. A freshly constructed RepairOption with the same label, preview, and ApprovedPatch as a shown option is accepted even though that object was never presented. Probe: present `shown`, then call `record_choice('f', clone)` where `clone == shown` but `clone is not shown`; the call records the clone.",
      "finding": "FakeInteractiveSession does not enforce the presented-option contract by object identity, so unpresented equal clones can be recorded as choices.",
      "key": "medium|pdd/checkup_interactive_session.py:109|fakeinteractivesession does not enforce the presented-option contract by object identity, so unpresented equal clones can be recorded as choices.|validate against the actual presented option objects, for example `any(option is presented_option for presented_option in presented)`, and add a regression test for an equal-but-unpresented clone. this matters because the issue contract requires rejecting choices that were not previously presented for the f",
      "location": "pdd/checkup_interactive_session.py:109",
      "required_fix": "Validate against the actual presented option objects, for example `any(option is presented_option for presented_option in presented)`, and add a regression test for an equal-but-unpresented clone. This matters because the issue contract requires rejecting choices that were not previously presented for the finding.",
      "reviewer": "codex",
      "round": "1",
      "severity": "medium",
      "status": "open"
    },
    {
      "area": "file",
      "evidence": "The new documentation says `See docs/checkup_interactive_session_spike.md for the ownership boundary`, but that file is not present in this PR or the checked-out base tree (`test -f docs/checkup_interactive_session_spike.md` returns false).",
      "finding": "The PR introduces a local documentation reference to a missing file.",
      "key": "low|docs/checkup_interactive_session.md:14|the pr introduces a local documentation reference to a missing file.|either include the referenced spike document, change the reference to an existing issue/pr url, or rebase after the dependency that adds the file lands so the merged repository does not contain a broken docs path.",
      "location": "docs/checkup_interactive_session.md:14",
      "required_fix": "Either include the referenced spike document, change the reference to an existing issue/PR URL, or rebase after the dependency that adds the file lands so the merged repository does not contain a broken docs path.",
      "reviewer": "codex",
      "round": "1",
      "severity": "low",
      "status": "open"
    }
  ],
  "fresh_final_status": "missing",
  "full_suite_source": "github-checks",
  "github_ci_gate_used": true,
  "issue_aligned": false,
  "issue_url": "https://github.com/promptdriven/pdd/issues/1435",
  "max_cost_reached": false,
  "max_duration_reached": false,
  "max_rounds_reached": false,
  "pr_url": "https://github.com/promptdriven/pdd/pull/1485",
  "reason": "Fixer gemini could not address codex's findings.",
  "remote_pr_head_sha": "none",
  "reviewer_status": {
    "codex": "findings",
    "fresh-final": "missing",
    "gemini": "fixer"
  },
  "schema": "pdd.checkup.final_gate.v1",
  "stage": "review-loop",
  "status": "failed",
  "test_scope": "targeted",
  "verified_head_sha": "none"
}

Findings

Severity Status Location Finding Required fix Reviewer
blocker open pdd/prompts/checkup_interactive_session_python.prompt:8 The source prompt/architecture interface models dataclasses as functions, so PDD's conformance gate treats them as missing callable functions and blocks sync for this module. Move ApprovedPatch and RepairOption out of module.functions into the repository's dataclasses/classes interface shape, or otherwise use a non-function interface representation that the conformance checker does not parse as required callables; then resync architecture.json. Adding tests alone is insufficient because pdd sync still fails before generation can complete. codex
medium open pdd/checkup_interactive_session.py:109 FakeInteractiveSession does not enforce the presented-option contract by object identity, so unpresented equal clones can be recorded as choices. Validate against the actual presented option objects, for example any(option is presented_option for presented_option in presented), and add a regression test for an equal-but-unpresented clone. This matters because the issue contract requires rejecting choices that were not previously presented for the finding. codex
low open docs/checkup_interactive_session.md:14 The PR introduces a local documentation reference to a missing file. Either include the referenced spike document, change the reference to an existing issue/PR URL, or rebase after the dependency that adds the file lands so the merged repository does not contain a broken docs path. codex

Fixer Rationale

  • pdd/prompts/checkup_interactive_session_python.prompt:8: The source prompt/architecture interface models dataclasses as functions, so PDD's conformance gate treats them as missing callable functions and blocks sync for this module. (fixer=gemini fixer_summary='All agent providers failed: google: Authentication required. Please visit the URL to log in:'; verification=unverified)
  • pdd/checkup_interactive_session.py:109: FakeInteractiveSession does not enforce the presented-option contract by object identity, so unpresented equal clones can be recorded as choices. (fixer=gemini fixer_summary='All agent providers failed: google: Authentication required. Please visit the URL to log in:'; verification=unverified)
  • docs/checkup_interactive_session.md:14: The PR introduces a local documentation reference to a missing file. (fixer=gemini fixer_summary='All agent providers failed: google: Authentication required. Please visit the URL to log in:'; verification=unverified)

Fixes Attempted

  • round=1 fixer=gemini fixer_result=failed push_status=not_attempted local_sha=none pushed_sha=none changed_files=none verification=unverified summary=All agent providers failed: google: Authentication required. Please visit the URL to log in:

@prompt-driven-github

Copy link
Copy Markdown
Contributor Author

❌ PDD Execution Failed

Execution failed: PDD command failed with exit code 1.
Output: er_summary='All agent providers failed: google: Authentication required. Please visit the URL to log in:'; verification=unverified)

Fixes Attempted

  • round=1 fixer=gemini fixer_result=failed push_status=not_attempted local_sha=none pushed_sha=none changed_files=none verification=unverified summary=All agent providers failed: google: Authentication required. Please visit the URL to log in: — verdict: not shippable (findings remain or verification is unverified).
    Cost: $8.8677
    Model: openai

@DianaTao DianaTao added pdd-opus and removed pdd-gemini-flash PDD: use Gemini 3 Flash (fastest, preview) labels Jun 9, 2026
Use the repo dataclasses key for ApprovedPatch and RepairOption so
pdd sync conformance no longer treats @DataClass classes as missing
functions. Add a regression test for _verify_pdd_interface_signatures.

Co-authored-by: Cursor <cursoragent@cursor.com>
@gltanaka gltanaka merged commit ed39620 into main Jun 11, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pdd-checkup PDD: run a codebase checkup pdd-opus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(checkup): InteractiveRepairSession protocol for prompt repair

3 participants