Skip to content

feat: multi-turn / scripted-clarification for under-specified reports #117

Description

@slowdini

Motivation

A debugging skill's largest real-world value shows up when the bug report is missing detail
the disciplined move is to reproduce / ask before fixing. Today every case is a single
prompt → single dispatch → single response, so we can only test fully-specified reports, which is
the easy case. Under-specified reports are exactly where an undisciplined agent guesses wrong and
a disciplined one asks first — but we can't stage that.

Driver: the investigating-bugs timezone eval would be sharper if the report could be vague
("the date is wrong for some users") and the missing detail ("they're all in US timezones; it's a
date-only field") were delivered only if the agent asks for it.

Proposed surface

Let a case define follow-up turns or canned answers:

{
  "prompt": "The due date is wrong for some users. Fix it.",
  "clarification": {
    "answers": "The affected users are all in US timezones; the field is a date-only value (no time).",
    "deliver_when": "agent_asks"
  }
}
  • deliver_when: "agent_asks" — the canned answers are fed as a second user turn only if the
    agent posed a clarifying question first. Reward asking-before-fixing.
  • deliver_when: "always" — fixed follow-up turn(s) delivered regardless (for staging a known
    back-and-forth).
  • Generalize to an ordered turns list for multi-step scripts if useful.

Semantics

  • Requires a second dispatch turn feeding the agent the follow-up, then grading the final state.
  • A transcript_check/llm_judge can assert the clarifying question preceded any edit (i.e. the
    agent asked before fixing), which is the behavior under test.

Acceptance criteria

  • A case can deliver a scripted follow-up turn after the agent's first response.
  • agent_asks mode delivers the answer only when the agent actually asked, and this is
    observable to assertions.
  • Single-turn cases are unaffected when clarification/turns is absent.

Back-compat

Additive; absence preserves today's one-shot behavior. Document the extra dispatch cost per case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions