Skip to content

feat: notify_then_route — error observability for all on_error: abort paths #543

Description

@PolyphonyRequiem

Summary

Every on_error: true → abort_run route in polyphony should interpose a type: notification step before routing to the terminal. This converts silent aborts into observable events that platespinner (and operators watching notifications.jsonl) can surface without inspecting events.jsonl.

Pattern: failing_step → (on_error: true) → step_failed_notifier → abort_run

The notification step has full access to {{ failing_step.error.kind }}, {{ failing_step.error.message }}, and {{ failing_step.error.details }} because it is downstream on the error route path.

YAML sketch

notifications:
  namespace: polyphony.plan_level
  correlation: [work_item_id]
  types:
    step_failed:
      version: 1
      payload:
        step_name:     { type: string }
        error_kind:    { type: string }
        error_message: { type: string }
        work_item_id:  { type: string }

agents:
  - name: commit_and_push
    type: script
    raises: [internal.script_error]
    routes:
      - to: next_step
      - to: commit_failed_notifier
        on_error: true

  - name: commit_failed_notifier
    type: notification
    notification: step_failed
    payload:
      step_name:     "commit_and_push"
      error_kind:    "{{ commit_and_push.error.kind }}"
      error_message: "{{ commit_and_push.error.message }}"
      work_item_id:  "{{ workflow.input.work_item_id }}"
    routes:
      - to: abort_run

Scope

  • Applies to all 19 retrofit gates in Phase 2: on_error: retrofit — restore the 19 gates removed in #535 #536 (Phase 2a immediately; Phase 2b once RFC retry: ships)
  • One step_failed notification type per workflow namespace (shared across all gates in that workflow)
  • ~3-4 new nodes per gate, but they are trivial (no LLM, no script, no provider call)
  • Additive only — no existing routes change

Prerequisites

conductor Phase 1 (microsoft/conductor#229) + notifications (microsoft/conductor#213). No polyphony verb changes.

Notes

For the 14 retry+abort gates: once RFC Phase 2 retry: ships, replace step_failed_notifier with retry_exhausted_notifier that fires only on budget exhaustion. The notification type shape stays the same.

Full patterns doc: .squad/decisions/inbox/wagner-on-error-notifications-patterns-2026-05-28T23-00-46Z.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    squad:short-termShort-term win (≤1 week)squad:wagnerOwner: Wagner (Workflow Author)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions