Deferred from the v1 worker–evaluator dialogue (proposals/v1-implementation-plan.md, Phase 6 / OQ #9). Not on the v1 critical path — Phases 1–5 are landed; this is the visualization follow-up now that the dialogue produces rich structured data.
Scope
1. Per-task ledger panel. The per-task evaluator ledger (sessions/<id>/ledger/<task_id>.jsonl, Phase 2) is the richest artifact v1 produces — the iteration arc of verdicts, concerns, next-steps, and diff summaries. tilth visualize should render it as a dedicated panel showing the arc over time (verdicts as a timeline, diffs collapsible), not just the inline evaluator_verdict cards.
2. Dedicated renderers for the dialogue event types. Several v1 event types fall through to _render_unknown (the generic dim JSON card — acceptable but unpolished). Each deserves a purpose-built renderer in tilth/visualize/render.py's _RENDERERS map:
prompt_assembled — the assembled user message per actor (role ∈ worker | evaluator | self_improve). Large; render collapsed with role + iter + char count, expandable.
ledger_appended — lightweight pointer; could fold into the ledger panel above.
evaluator_parse_error / case_parse_error — raw failing payloads; render as a clearly-flagged error card.
empty_model_response — provider-hiccup marker (streak, finish_reason, tokens); render as a warning card.
(memory_load and hook_run also fall through today; lower priority — they predate the v1 dialogue.)
Current state
_render_evaluator_verdict (Phase 1) already renders the structured verdict (category, evidence, next_step) — the model to follow for the rest.
Why deferred
The ledger is rich data and the visualizer is the natural place to render it, but it's a read-side nicety, not loop mechanics. v1 prioritized the dialogue itself; this makes the output legible.
Deferred from the v1 worker–evaluator dialogue (
proposals/v1-implementation-plan.md, Phase 6 / OQ #9). Not on the v1 critical path — Phases 1–5 are landed; this is the visualization follow-up now that the dialogue produces rich structured data.Scope
1. Per-task ledger panel. The per-task evaluator ledger (
sessions/<id>/ledger/<task_id>.jsonl, Phase 2) is the richest artifact v1 produces — the iteration arc of verdicts, concerns, next-steps, and diff summaries.tilth visualizeshould render it as a dedicated panel showing the arc over time (verdicts as a timeline, diffs collapsible), not just the inlineevaluator_verdictcards.2. Dedicated renderers for the dialogue event types. Several v1 event types fall through to
_render_unknown(the generic dim JSON card — acceptable but unpolished). Each deserves a purpose-built renderer intilth/visualize/render.py's_RENDERERSmap:prompt_assembled— the assembled user message per actor (role∈ worker | evaluator | self_improve). Large; render collapsed with role + iter + char count, expandable.ledger_appended— lightweight pointer; could fold into the ledger panel above.evaluator_parse_error/case_parse_error— raw failing payloads; render as a clearly-flagged error card.empty_model_response— provider-hiccup marker (streak, finish_reason, tokens); render as a warning card.(
memory_loadandhook_runalso fall through today; lower priority — they predate the v1 dialogue.)Current state
_render_evaluator_verdict(Phase 1) already renders the structured verdict (category, evidence, next_step) — the model to follow for the rest.Why deferred
The ledger is rich data and the visualizer is the natural place to render it, but it's a read-side nicety, not loop mechanics. v1 prioritized the dialogue itself; this makes the output legible.