[chore] add manual run-completion scripts by Thibaut-Fatus · Pull Request #22 · korabench/benchmark

Thibaut-Fatus · 2026-06-11T11:44:32Z

Adds the operator tooling used to finish a run by hand when a target can't be driven automatically (e.g. an app web-runner that hits a transient backend error and auto-skips a scenario). These don't exist in any sibling repo (kora-infra's manualTestsService is the server-side, re-simulating flow; this is local and reuses the exact collected transcripts).

scripts/manual-rerun.mjs — human-in-the-loop driver: prints each user (child) turn, the operator pastes the app's reply, and the benchmark's own user-simulator (generateNextUserMessage) produces the next turn, keeping multi-turn conversations faithful. Transcripts persist append-only to RUN_DIR/manual-reruns.json (+ readable .md).
scripts/complete-run.mjs — judges the collected transcripts via kora.runTest with the full conversation as startMessages (turn loop skipped → straight to judges) and overwrites the matching .kora-run-tmp/<hash>.json, so a follow-up kora run aggregates the finished run.
scripts/README.md + a scripts/ line in the project structure.

Both are parameterized by RUN_DIR (default preserves the original behavior). They import the built packages, so yarn build first and run with node --env-file=.env.

manual-rerun.mjs: human-in-the-loop driver to collect app conversations when a target can't be driven automatically — prints each user turn, the operator pastes the app reply, and the benchmark's own user-simulator generates the next turn. complete-run.mjs: judge the collected transcripts (runTest with full startMessages → judges only) and overwrite the matching .kora-run-tmp results so a re-run aggregates the finished run. Both parameterized by RUN_DIR. See scripts/README.md.

Thibaut-Fatus merged commit b7b9764 into main Jun 11, 2026
4 checks passed

Thibaut-Fatus deleted the chore/manual-run-completion-scripts branch June 11, 2026 11:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chore] add manual run-completion scripts#22

[chore] add manual run-completion scripts#22
Thibaut-Fatus merged 1 commit into
mainfrom
chore/manual-run-completion-scripts

Thibaut-Fatus commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Thibaut-Fatus commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant