fix(agent): use standard shell code fences as the action format by larstalian · Pull Request #357 · vecna-labs/open-range

larstalian · 2026-06-26T01:27:32Z

What

Shell actions in the agent loop now use the standard markdown code fence a model is trained to emit — ```bash / ```sh / ```shell / ```console / ```zsh — instead of the bespoke ```run_shell token.

Why

parse_action only recognized ```run_shell / ```finish. Any other fence failed the match and fell through to the "no recognized block → finish" path, which ends the episode and discards the action (including a submit). Instruct/cold models reach for ```bash constantly — it has overwhelming pretraining mass, while ```run_shell appears essentially only in our own prompt — so a competent rollout could die at a single step with its training signal lost.

open-range's design is one shell primitive + a finish (no per-verb tool menu), so there is nothing for a custom fence token to disambiguate; it is pure friction against the model's prior. This also matches the CodeAct finding — executable code as the action space outperforms bespoke/JSON tool encodings (Wang et al. 2024).

Changes

parse_action accepts bash|sh|shell|console|zsh as a shell command and finish as the terminal block.
The last recognized block wins, so an illustrative snippet earlier in a reply is not executed in place of the action the model actually settled on.
The internal action identifier stays run_shell; only the wire format the model emits changes. No back-compat alias for the old ```run_shell fence (the only emitters were our own test fixtures).
finish stays a sentinel — there is no standard markdown fence for "done".
Tests updated, plus a new case covering the standard fences and last-block selection.

Test

pytest tests/test_agent_harness.py tests/test_rllm_shim.py → 13 passed, 5 skipped (rllm not installed locally). ruff check, ruff format --check, and mypy clean on the changed files.

🤖 Generated with Claude Code

Models are trained to emit shell commands as standard markdown code fences (```bash / ```sh / ```shell / ```console / ```zsh), not a bespoke ```run_shell token. The strict run_shell-only parser silently coerced a ```bash reply into the "no recognized block -> finish" path, ending the episode and discarding the action (including any submit) — so a competent rollout could die at one step with its training signal lost. Adopt the standard code fence as the action format (CodeAct-style: executable code is the action), drop the bespoke run_shell fence, and parse the *last* recognized block so an illustrative snippet earlier in a reply is not executed in place of the action the model actually settled on. The internal action identifier stays "run_shell"; only the wire format the model emits changes. finish stays a sentinel — there is no standard fence for "done". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

larstalian merged commit 29f1081 into main Jun 26, 2026
2 checks passed

larstalian deleted the fix/standard-shell-action-fences branch June 26, 2026 01:38

github-actions Bot locked and limited conversation to collaborators Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agent): use standard shell code fences as the action format#357

fix(agent): use standard shell code fences as the action format#357
larstalian merged 1 commit into
mainfrom
fix/standard-shell-action-fences

larstalian commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

larstalian commented Jun 26, 2026

What

Why

Changes

Test

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant