fix(agent): use standard shell code fences as the action format#357
Merged
Conversation
Models are trained to emit shell commands as standard markdown code fences (```bash / ```sh / ```shell / ```console / ```zsh), not a bespoke ```run_shell token. The strict run_shell-only parser silently coerced a ```bash reply into the "no recognized block -> finish" path, ending the episode and discarding the action (including any submit) — so a competent rollout could die at one step with its training signal lost. Adopt the standard code fence as the action format (CodeAct-style: executable code is the action), drop the bespoke run_shell fence, and parse the *last* recognized block so an illustrative snippet earlier in a reply is not executed in place of the action the model actually settled on. The internal action identifier stays "run_shell"; only the wire format the model emits changes. finish stays a sentinel — there is no standard fence for "done". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Shell actions in the agent loop now use the standard markdown code fence a model is trained to emit —
```bash/```sh/```shell/```console/```zsh— instead of the bespoke```run_shelltoken.Why
parse_actiononly recognized```run_shell/```finish. Any other fence failed the match and fell through to the "no recognized block → finish" path, which ends the episode and discards the action (including asubmit). Instruct/cold models reach for```bashconstantly — it has overwhelming pretraining mass, while```run_shellappears essentially only in our own prompt — so a competent rollout could die at a single step with its training signal lost.open-range's design is one shell primitive + a finish (no per-verb tool menu), so there is nothing for a custom fence token to disambiguate; it is pure friction against the model's prior. This also matches the CodeAct finding — executable code as the action space outperforms bespoke/JSON tool encodings (Wang et al. 2024).
Changes
parse_actionacceptsbash|sh|shell|console|zshas a shell command andfinishas the terminal block.run_shell; only the wire format the model emits changes. No back-compat alias for the old```run_shellfence (the only emitters were our own test fixtures).finishstays a sentinel — there is no standard markdown fence for "done".Test
pytest tests/test_agent_harness.py tests/test_rllm_shim.py→ 13 passed, 5 skipped (rllm not installed locally).ruff check,ruff format --check, andmypyclean on the changed files.🤖 Generated with Claude Code