Skip to content

[chore] add manual run-completion scripts#22

Merged
Thibaut-Fatus merged 1 commit into
mainfrom
chore/manual-run-completion-scripts
Jun 11, 2026
Merged

[chore] add manual run-completion scripts#22
Thibaut-Fatus merged 1 commit into
mainfrom
chore/manual-run-completion-scripts

Conversation

@Thibaut-Fatus

Copy link
Copy Markdown
Collaborator

Adds the operator tooling used to finish a run by hand when a target can't be driven automatically (e.g. an app web-runner that hits a transient backend error and auto-skips a scenario). These don't exist in any sibling repo (kora-infra's manualTestsService is the server-side, re-simulating flow; this is local and reuses the exact collected transcripts).

  • scripts/manual-rerun.mjs — human-in-the-loop driver: prints each user (child) turn, the operator pastes the app's reply, and the benchmark's own user-simulator (generateNextUserMessage) produces the next turn, keeping multi-turn conversations faithful. Transcripts persist append-only to RUN_DIR/manual-reruns.json (+ readable .md).
  • scripts/complete-run.mjs — judges the collected transcripts via kora.runTest with the full conversation as startMessages (turn loop skipped → straight to judges) and overwrites the matching .kora-run-tmp/<hash>.json, so a follow-up kora run aggregates the finished run.
  • scripts/README.md + a scripts/ line in the project structure.

Both are parameterized by RUN_DIR (default preserves the original behavior). They import the built packages, so yarn build first and run with node --env-file=.env.

manual-rerun.mjs: human-in-the-loop driver to collect app conversations
when a target can't be driven automatically — prints each user turn, the
operator pastes the app reply, and the benchmark's own user-simulator
generates the next turn. complete-run.mjs: judge the collected transcripts
(runTest with full startMessages → judges only) and overwrite the matching
.kora-run-tmp results so a re-run aggregates the finished run. Both
parameterized by RUN_DIR. See scripts/README.md.
@Thibaut-Fatus Thibaut-Fatus merged commit b7b9764 into main Jun 11, 2026
4 checks passed
@Thibaut-Fatus Thibaut-Fatus deleted the chore/manual-run-completion-scripts branch June 11, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant