feat: add /beutl-loop autonomous board loop (loop engineering)#2018
feat: add /beutl-loop autonomous board loop (loop engineering)#2018yuto-trd wants to merge 2 commits into
Conversation
Introduce a committed "loop engineering" layer that drives Beutl's existing
single-item board-task flow as an autonomous, guardrailed outer loop.
- New skill `/beutl-loop`: orchestrator that drains the Project #9 board by
default (until-empty; runaway backstop BEUTL_LOOP_MAX_ITEMS=50), dispatching a
worktree-isolated sub-agent per tick to keep its own context lean, then
risk-gates a squash merge (low-to-moderate-risk) or leaves the PR for a human.
- New agent `beutl-board-task-runner`: implements one item -> PR in an isolated
worktree and returns a stable structured JSON result; never merges.
- New skill `beutl-resolve-reviews`: addresses + resolves PR reviews; `--auto`
handles bot feedback only (CodeRabbit/Copilot/Codex/Claude) and escalates any
human review to the human.
- New `.claude/scripts/beutl-loop.sh`: conservative headless launcher (scoped
allowlist, no --dangerously-skip-permissions by default, refuses
main/detached/flat branches).
- New guide `docs/ai-workflow/loop-engineering.md`: the 6-part loop contract, the
moderate-risk auto-merge policy, and the anti-loopmaxxing guards.
- Rename skill `beutl-ai-review-task` -> `beutl-board-task` and drop its
non-feature/low-risk selection bias (features and higher-risk items are now in
scope; risk is handled at merge time).
- Wire pointers into AGENTS.md, CLAUDE.md, and
docs/ai-workflow/{README,subagents-and-hooks}.md.
Merge safety is anchored on GitHub's `main` rulesets (required checks
build/dotnet-format, code-owner review via * @yuto-trd, thread resolution,
squash-only, signed commits); the loop self-gates as defense-in-depth and fails
safe to the human when unsure.
Refs: Project #9 "AI Review" board
📝 WalkthroughWalkthroughAdds a Project ChangesProject
Sequence Diagram(s)sequenceDiagram
participant beutlLoop as beutl-loop
participant boardTaskRunner as beutl-board-task-runner
participant resolveReviews as beutl-resolve-reviews
participant githubRulesets as GitHub rulesets
beutlLoop->>boardTaskRunner: run one board item in a worktree
boardTaskRunner-->>beutlLoop: return JSON result
beutlLoop->>resolveReviews: resolve bot review comments with --auto
resolveReviews-->>beutlLoop: return thread status and needs_human
beutlLoop->>githubRulesets: attempt squash merge
githubRulesets-->>beutlLoop: allow or refuse merge
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR introduces “loop engineering” to Beutl’s AI workflow by adding a new autonomous /beutl-loop skill that can drain Project #9 items in a bounded, guardrailed loop, supported by a per-tick worktree sub-agent and a dedicated skill for resolving bot PR reviews. It updates core workflow docs (AGENTS/CLAUDE/docs) to document the contract, safety model, and how to use the new loop tooling.
Changes:
- Add
/beutl-looporchestrator skill, plus a conservative headless launcher script (.claude/scripts/beutl-loop.sh). - Add
beutl-board-task-runnerworktree-isolated sub-agent for “one item → PR” execution, andbeutl-resolve-reviewsskill for resolving bot review feedback. - Update workflow documentation to reflect the new loop contract, guardrails, and expanded Project #9 task scope via
beutl-board-task.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| docs/ai-workflow/subagents-and-hooks.md | Documents the new per-tick runner agent and the board-loop skill surface. |
| docs/ai-workflow/README.md | Updates AI workflow entry points and adds /beutl-loop + related tooling references. |
| docs/ai-workflow/loop-engineering.md | New guide defining the loop contract, guardrails, and risk-gated merge policy. |
| CLAUDE.md | Adds a concise explanation of /beutl-loop behavior and safety/authorization model. |
| AGENTS.md | Adds a top-level “Autonomous board loop” entry and links to loop docs. |
| .claude/skills/beutl-resolve-reviews/SKILL.md | New skill for addressing and resolving PR review feedback (bot-only autonomy in --auto). |
| .claude/skills/beutl-loop/SKILL.md | New orchestrator skill defining the bounded board-draining loop procedure. |
| .claude/skills/beutl-board-task/SKILL.md | Broadens single-item task execution to include features (risk gated downstream). |
| .claude/scripts/beutl-loop.sh | New headless launcher enforcing conservative permission and branch-safety constraints. |
| .claude/agents/beutl-board-task-runner.md | New worktree-isolated agent to implement one Project #9 item and open a PR with structured JSON output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| --- | ||
| description: | | ||
| Fetch every review and review comment on a pull request (CodeRabbit, GitHub Copilot, Codex, | ||
| Claude code-review, and humans), decide which genuinely need a code change, address the clearly | ||
| actionable ones, then reply and resolve the threads. Two modes: the default is human-in-the-loop | ||
| (confirm each comment via AskUserQuestion, like the global handle-pr-reviews skill); pass `--auto` | ||
| for autonomous resolution — used by /beutl-loop after it opens a PR. Use when the user says | ||
| "PRレビューに対応", "レビュー指摘を反映してResolve", "address and resolve the PR reviews". | ||
| argument-hint: "[--auto] [reply | no-resolve] [PR#]" | ||
| --- |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/agents/beutl-board-task-runner.md:
- Around line 43-45: Treat the manual-verification path in the task runner as
blocked rather than completed: update the guidance around the baseline-first
loop so that `manual-verification` does not allow in-scope changes to pass
without tests, and require a matching NUnit test in the corresponding `tests/`
project for new logic. Adjust the `test_status`/verification wording in the
task-runner docs so only testable work can be marked done, while UI or
hard-to-unit-test work remains blocked until covered or the scope changes.
In @.claude/scripts/beutl-loop.sh:
- Around line 33-38: The beutl-loop launcher currently only validates $1 in
beutl-loop.sh and silently ignores any extra arguments, so calls with a second
filter argument are not honored. Update the argument handling around the BUDGET
case to either parse and pass through the supported kind filter from the
launcher into the /beutl-loop invocation path, or explicitly reject unexpected
extra args with a clear error; use the existing beutl-loop.sh entrypoint and the
advertised SKILL.md optional kind filter as the reference points.
In @.claude/skills/beutl-board-task/SKILL.md:
- Around line 76-82: The selection guidance in SKILL.md currently allows
hard-to-unit-test work to proceed with only manual verification, which conflicts
with the test-first rule. Update the wording around the two guards on selection
so that new logic in the relevant workflow must land with an NUnit test in the
matching tests project; if automation is not possible, treat the item as blocked
instead of substituting manual verification. Keep the blocked/stagnation
handling aligned with the existing selection rules and adjust the surrounding
guidance text accordingly.
- Around line 64-66: The candidate filtering in the board-claim flow still has a
TOCTOU gap: even after selecting unclaimed items, another actor can move the
item to In Progress before the update. Update the claim path in the relevant
board mutation logic to re-check the item status immediately before claiming,
and handle an atomic-claim failure or stale-status rejection gracefully so
already-claimed items are not duplicated.
In @.claude/skills/beutl-loop/SKILL.md:
- Around line 155-158: The unresolved-thread check in the merge loop only
queries reviewThreads(first:100), so it can miss open threads past the first
page. Update the check to paginate through all review threads using pageInfo and
after in the gh api GraphQL query, and keep counting unresolved items until
every page has been processed. Use the existing UNRESOLVED logic in the loop to
ensure the merge gate only passes when all review threads are resolved.
In @.claude/skills/beutl-resolve-reviews/SKILL.md:
- Around line 61-63: The GraphQL query in the review-thread lookup only fetches
comments(first:50), so Step 3 can miss the true latest reply on long threads and
use stale context. Update the thread-fetching logic around the reviewThreads
query to paginate the nested comments connection as well, or otherwise ensure
the full comment history is retrieved before resolving threads; use the
reviewThreads/comments structure and databaseId-to-thread mapping as the key
symbols to locate the change.
In `@docs/ai-workflow/loop-engineering.md`:
- Around line 200-201: The dry-run example uses an argument order that does not
match the documented `/beutl-loop` command forms. Update the example in the
loop-engineering guide to use one of the supported shapes from
`beutl-loop/SKILL.md`—for example `/beutl-loop dry-run` with an optional kind
filter—or expand the skill contract first if a numeric dry-run budget is
intended. Keep the example aligned with the `dry-run` invocation syntax so the
docs and `SKILL.md` agree.
- Around line 155-156: The wrapped Project `#9` sentence in the markdown doc is
triggering markdownlint MD018 because `#9` appears at the start of a line.
Reflow the text in the surrounding paragraph so `#9` is not at column 1, or
format it as inline code, keeping the sentence within the existing docs section
that mentions the renamed `beutl-board-task` / `beutl-ai-review-task` note.
- Line 202: The current validation step only performs a syntax check with bash
-n on beutl-loop.sh, so it cannot prove the main-branch guard works. Update the
loop-engineering.md instructions to separate this into two checks: keep the bash
-n parse verification for the script, and add a real execution of the
beutl-loop.sh path that exercises the branch check and confirms it refuses to
run on main. Use the existing beutl-loop.sh reference and the validation list
item in loop-engineering.md so the runtime refusal is explicitly demonstrated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: dd54d2f5-ca75-42dd-a5df-66addf5a837a
📒 Files selected for processing (10)
.claude/agents/beutl-board-task-runner.md.claude/scripts/beutl-loop.sh.claude/skills/beutl-board-task/SKILL.md.claude/skills/beutl-loop/SKILL.md.claude/skills/beutl-resolve-reviews/SKILL.mdAGENTS.mdCLAUDE.mddocs/ai-workflow/README.mddocs/ai-workflow/loop-engineering.mddocs/ai-workflow/subagents-and-hooks.md
| BUDGET="${1:-until-empty}" | ||
| case "$BUDGET" in | ||
| until-empty) : ;; | ||
| ''|*[!0-9]*) echo "argument must be 'until-empty' or a positive integer (got '$BUDGET')" >&2; exit 2 ;; | ||
| *) [ "$BUDGET" -gt 0 ] 2>/dev/null || { echo "item budget must be a positive integer (got '$BUDGET')" >&2; exit 2; } ;; | ||
| esac |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Reject or forward extra /beutl-loop arguments.
The launcher only reads $1, so a call like .claude/scripts/beutl-loop.sh 1 feature silently degrades to /beutl-loop 1. Since .claude/skills/beutl-loop/SKILL.md:1-13 advertises an optional kind filter, this widens unattended scope instead of honoring the caller’s constraint. Either parse and forward the supported second argument or fail fast on any extra args.
Suggested fail-fast fix
+if [ "$#" -gt 1 ]; then
+ echo "usage: .claude/scripts/beutl-loop.sh [until-empty | N]" >&2
+ echo "unexpected extra arguments: this launcher does not forward additional /beutl-loop args." >&2
+ exit 2
+fi
+
# Default to draining the board; an integer arg sets a tighter budget.
BUDGET="${1:-until-empty}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| BUDGET="${1:-until-empty}" | |
| case "$BUDGET" in | |
| until-empty) : ;; | |
| ''|*[!0-9]*) echo "argument must be 'until-empty' or a positive integer (got '$BUDGET')" >&2; exit 2 ;; | |
| *) [ "$BUDGET" -gt 0 ] 2>/dev/null || { echo "item budget must be a positive integer (got '$BUDGET')" >&2; exit 2; } ;; | |
| esac | |
| if [ "$#" -gt 1 ]; then | |
| echo "usage: .claude/scripts/beutl-loop.sh [until-empty | N]" >&2 | |
| echo "unexpected extra arguments: this launcher does not forward additional /beutl-loop args." >&2 | |
| exit 2 | |
| fi | |
| BUDGET="${1:-until-empty}" | |
| case "$BUDGET" in | |
| until-empty) : ;; | |
| ''|*[!0-9]*) echo "argument must be 'until-empty' or a positive integer (got '$BUDGET')" >&2; exit 2 ;; | |
| *) [ "$BUDGET" -gt 0 ] 2>/dev/null || { echo "item budget must be a positive integer (got '$BUDGET')" >&2; exit 2; } ;; | |
| esac |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/scripts/beutl-loop.sh around lines 33 - 38, The beutl-loop launcher
currently only validates $1 in beutl-loop.sh and silently ignores any extra
arguments, so calls with a second filter argument are not honored. Update the
argument handling around the BUDGET case to either parse and pass through the
supported kind filter from the launcher into the /beutl-loop invocation path, or
explicitly reject unexpected extra args with a clear error; use the existing
beutl-loop.sh entrypoint and the advertised SKILL.md optional kind filter as the
reference points.
| **Candidate filter.** Only **unclaimed** items are candidates: `Backlog` and `Todo`. Skip `Done` / | ||
| `False positive`, and skip `In Progress` too — on this board `In Progress` means another | ||
| agent/contributor has already claimed it, so treating it as a candidate invites duplicate work. |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Revalidate before claiming.
Filtering on the snapshot alone still leaves a TOCTOU gap: another run or a human can move the item to In Progress before the board update. Add a final status check or atomic-claim failure path right before mutating the board.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/beutl-board-task/SKILL.md around lines 64 - 66, The candidate
filtering in the board-claim flow still has a TOCTOU gap: even after selecting
unclaimed items, another actor can move the item to In Progress before the
update. Update the claim path in the relevant board mutation logic to re-check
the item status immediately before claiming, and handle an atomic-claim failure
or stale-status rejection gracefully so already-claimed items are not
duplicated.
| # Unresolved review threads must be 0 (assumes ≤100; paginate for more — GitHub also enforces this, | ||
| # so the loop only checks to avoid a futile merge attempt): | ||
| UNRESOLVED=$(gh api graphql -f query='query{repository(owner:"b-editor",name:"beutl"){pullRequest(number:'"$PR"'){reviewThreads(first:100){nodes{isResolved}}}}}' \ | ||
| --jq '[.data.repository.pullRequest.reviewThreads.nodes[]|select(.isResolved==false)]|length') |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n## SKILL slice\n'
wc -l .claude/skills/beutl-loop/SKILL.md
sed -n '145,170p' .claude/skills/beutl-loop/SKILL.md
printf '\n## Loop engineering doc slice\n'
if [ -f docs/ai-workflow/loop-engineering.md ]; then
wc -l docs/ai-workflow/loop-engineering.md
sed -n '65,95p' docs/ai-workflow/loop-engineering.md
else
echo 'docs/ai-workflow/loop-engineering.md not found'
fi
printf '\n## Search reviewThreads pagination usage\n'
rg -n "reviewThreads|isResolved|pageInfo|hasNextPage" .claude/skills/beutl-loop docs -g '!**/*.png' -g '!**/*.jpg' -g '!**/*.gif'Repository: b-editor/beutl
Length of output: 4797
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '112,124p' docs/ai-workflow/loop-engineering.mdRepository: b-editor/beutl
Length of output: 1199
Paginate the unresolved-thread check.
reviewThreads(first:100) can miss unresolved threads beyond the first page, so the merge gate can treat a PR as settled when it still has open review threads. Use pageInfo/after until all threads are counted.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/beutl-loop/SKILL.md around lines 155 - 158, The
unresolved-thread check in the merge loop only queries reviewThreads(first:100),
so it can miss open threads past the first page. Update the check to paginate
through all review threads using pageInfo and after in the gh api GraphQL query,
and keep counting unresolved items until every page has been processed. Use the
existing UNRESOLVED logic in the loop to ensure the merge gate only passes when
all review threads are resolved.
| # Thread ids + resolved state (to resolve later); map databaseId -> thread id. | ||
| # first:100 covers all but pathological PRs; if `pageInfo.hasNextPage` is true, follow the cursor. | ||
| gh api graphql -f query='query{repository(owner:"b-editor",name:"beutl"){pullRequest(number:<PR>){reviewThreads(first:100){pageInfo{hasNextPage endCursor} nodes{id isResolved comments(first:50){nodes{databaseId author{login} path line body}}}}}}}' |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the referenced skill file and search for the step-3 logic that uses comment order.
git ls-files '.claude/skills/beutl-resolve-reviews/SKILL.md' '**/*.md' '**/*.ts' '**/*.js' '**/*.py' | sed -n '1,200p'
echo '--- FILE ---'
cat -n .claude/skills/beutl-resolve-reviews/SKILL.md | sed -n '1,180p'
echo '--- SEARCH ---'
rg -n --hidden --no-ignore-vcs -S "latest comment|Step 3|comments\\(first:50\\)|reviewThreads|databaseId -> thread id|resolve later" .claude . 2>/dev/null | sed -n '1,200p'Repository: b-editor/beutl
Length of output: 18522
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the exact file and inspect nearby references if present.
fd -a 'SKILL.md' .claudeRepository: b-editor/beutl
Length of output: 1654
Paginate nested thread comments. Step 3 depends on the latest reply in a thread, but this query only loads comments(first:50). On longer threads, the real latest reply can fall past that cap and the resolver will act on stale context.
Paginate the nested comments connection, or document why a thread can never exceed 50 comments.
🧰 Tools
🪛 SkillSpector (2.2.3)
[info] 107: [EA3] Scope Creep: Skill's behavior or capabilities extend beyond its stated purpose. Scope creep allows an agent to perform actions unrelated to its documented functionality, increasing the attack surface.
Remediation: Limit the skill's scope to its documented purpose. Remove instructions that enable the agent to perform actions outside its stated functionality.
(Excessive Agency (EA3))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/beutl-resolve-reviews/SKILL.md around lines 61 - 63, The
GraphQL query in the review-thread lookup only fetches comments(first:50), so
Step 3 can miss the true latest reply on long threads and use stale context.
Update the thread-fetching logic around the reviewThreads query to paginate the
nested comments connection as well, or otherwise ensure the full comment history
is retrieved before resolving threads; use the reviewThreads/comments structure
and databaseId-to-thread mapping as the key symbols to locate the change.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d07ad5aae4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ALLOWED_TOOLS="Read Edit Write Grep Glob Task \ | ||
| Bash(gh:*) Bash(git:*) Bash(dotnet:*) Bash(python3:*) Bash(jq:*) Bash(mktemp:*) Bash(date:*) \ | ||
| Bash(grep:*) Bash(rg:*) Bash(ls:*) Bash(cat:*) Bash(sed:*) Bash(find:*) Bash(head:*) Bash(tail:*) Bash(wc:*) Bash(test:*)" |
There was a problem hiding this comment.
Allow the settle loop to sleep unattended
The /beutl-loop procedure requires polling bot reviews every ~90s and enforcing a ~10-minute quiet period, but the headless launcher says commands outside this --allowedTools set stall unattended and Bash(sleep:*) is not included. In a real headless run, the first attempt to wait between gh polls either prompts/stalls or the loop cannot honor the settle window, so low-risk PRs may never progress unattended; add sleep to this allowlist (and the skill frontmatter allowlist) so the documented polling can run.
Useful? React with 👍 / 👎.
| Group inline comments by `in_reply_to_id`; the **latest** comment in a thread is what matters. Skip: | ||
| PR-author comments, pure approvals / praise / 👍, and threads already `isResolved: true` or whose | ||
| latest reply says done/fixed/resolved. Keep the `{thread id ↔ comment databaseId}` map for Step 6. |
There was a problem hiding this comment.
Group review threads by top-level comment id
For PRs with more than one top-level inline review thread, grouping solely by in_reply_to_id collapses all top-level comments under the same null/missing key, so the resolver can inspect only the latest unrelated thread and skip other actionable bot comments. GitHub's REST docs for review-comment replies state that replies are attached to the top-level review comment id, so the grouping key should be in_reply_to_id ?? id (then map that top-level id to the GraphQL thread) rather than in_reply_to_id alone.
Useful? React with 👍 / 👎.
| # Inline (line-anchored, threaded) comments | ||
| gh api "repos/$OWNER_REPO/pulls/<PR>/comments" --paginate \ | ||
| --jq '[.[] | {id, in_reply_to_id, user: .user.login, path, line, body, html_url, diff_hunk}]' | ||
|
|
||
| # General PR (issue) comments | ||
| gh api "repos/$OWNER_REPO/issues/<PR>/comments" --paginate \ | ||
| --jq '[.[] | {id, user: .user.login, body, html_url}]' |
There was a problem hiding this comment.
Preserve comment timestamps for quiet-period checks
The loop relies on last_activity_at to decide whether a PR has been quiet for ~10 minutes, but these comment fetches discard created_at/updated_at; with a bot or human inline/general comment after the last review summary, the resolver cannot report the true newest activity and the orchestrator can treat the PR as settled too early. GitHub's list-review-comments response includes timestamps, so include them for both inline and issue comments (and fetch commit timestamps) before computing last_activity_at.
Useful? React with 👍 / 👎.
| enlarge the diff materially:** do **not** touch. Leave the thread open and mark `needs_human`. | ||
| - **After any edit, re-verify before pushing — this is mandatory:** | ||
| ```bash | ||
| dotnet build Beutl.slnx -f net10.0 # must be clean |
There was a problem hiding this comment.
Drop the forced framework from solution builds
When --auto edits any review comment, this mandatory verification runs against the whole solution with -f net10.0, but Beutl.slnx includes src/Beutl.Engine.SourceGenerators and sdk/Beutl.Extensibility.Sdk, which target only netstandard2.0. For those projects, forcing TargetFramework=net10.0 makes the build/restore red even when the review fix is correct, so the resolver will revert or refuse to push every handled comment; use the repo's documented dotnet build Beutl.slnx or build only affected net10.0 projects.
Useful? React with 👍 / 👎.
| gh pr view ${PR:-} --json number,url,headRefName,baseRefName,title,state,mergeable | ||
| OWNER_REPO=$(gh repo view --json nameWithOwner -q .nameWithOwner) # b-editor/beutl |
There was a problem hiding this comment.
Fetch the PR author before filtering comments
The resolver later has to skip PR-author comments and treat only non-author human feedback as needs_human, but this initial PR metadata does not include the author login. On PRs where the author has left an inline/general comment (or a previous resolver reply is authored by the same token), --auto cannot distinguish that from external human feedback and can unnecessarily mark the PR as needing a human even when all bot feedback was handled; include author in this gh pr view result and compare against it during classification.
Useful? React with 👍 / 👎.
| Bash(gh:*) Bash(git:*) Bash(dotnet:*) Bash(python3:*) Bash(jq:*) Bash(mktemp:*) Bash(date:*) \ | ||
| Bash(grep:*) Bash(rg:*) Bash(ls:*) Bash(cat:*) Bash(sed:*) Bash(find:*) Bash(head:*) Bash(tail:*) Bash(wc:*) Bash(test:*)" | ||
|
|
||
| PERM_ARGS=(--allowedTools "$ALLOWED_TOOLS") |
There was a problem hiding this comment.
Use --tools for the claimed command sandbox
The script describes this list as a conservative envelope where commands outside it stall, but Claude Code's CLI docs define --allowedTools as tools that run without prompting and explicitly say to use --tools to restrict what is available. In headless runs where project/user settings or auto-mode allow more tools, this does not actually remove bare Bash or other commands from the model's toolbox, so the safety contract documented above the launcher is not enforced; add an explicit --tools/deny policy for the intended tool set.
Useful? React with 👍 / 👎.
| and **`reviewDecision == APPROVED`** — a `REVIEW_REQUIRED` (e.g. code-owner approval still pending, | ||
| which is the usual case under `* @yuto-trd`) means **leave for human and do not even attempt the | ||
| merge**. **Otherwise high-risk → leave for human.** When unsure, choose human (fail safe). |
There was a problem hiding this comment.
Permit null reviewDecision when reviews are not required
This gate requires reviewDecision == APPROVED for every auto-merge, but GitHub returns no approved decision for PRs on paths/rulesets where reviews are not required and no human has reviewed. That contradicts the documented case where auto-merge should work once code-owner review is relaxed for some paths: such a low-risk PR can have green checks and be mergeable but still be left for a human forever because reviewDecision is null instead of APPROVED; treat null as acceptable only when the ruleset does not require reviews.
Useful? React with 👍 / 👎.
…ation precision Follow-up to the loop-engineering layer: add six guardrails to the autonomous Project #9 board loop, all in the AI-workflow tooling under .claude/ and docs/ai-workflow/ (no product/runtime code). - Test gate (B): the runner blocks instead of opening a PR when a production change (src/) ships no NUnit test and no manual-verification note. New signals touched_production / test_files_added[_count] / manual_verification_note drive it; the orchestrator re-checks the same condition as defense-in-depth and treats a violating PR as high-risk + no-progress. - Self-review gate (A): a six-point pre-commit check (compiled XAML bindings, no [Obsolete]/v2/compat-overload shim, no leftover TODO/Follow-up, root-cause fix, GPL/MIT boundary, subtree CLAUDE.md). Reported via self_review_passed/findings. - Design two-pass (D): a design-sensitive item is handed back as a pushed draft (draft_ready/draft_branch) instead of a PR; the orchestrator runs @beutl-design-reviewer with up to two rework iterations (runner Rework mode, REWORK/OPEN_PR/design_findings), then opens the PR -- high-risk to a human if the design is still unresolved after the budget. - Recent-merge scan (C): the runner and beutl-board-task skim the last ~15 "Refs: Project #9" merges to avoid reintroducing a just-removed pattern (a soft signal, not a gate). - Stagnation precision (E): the no-progress breaker is raised 2 -> 3 and held open by a PR in the last 3 ticks (journal last_pr_tick); blocked items are split into item-specific (skipped, neutral) vs systemic (counts toward the breaker) via a new blocked_kind signal; a single blocked item no longer halts the drain (it is recorded and reported at the end). - Bot false-positive reply path (F): beutl-resolve-reviews --auto can refute a clear bot false positive with a concrete path:line citation and resolve the thread without a code change (false_positives_resolved); an uncertain case escalates instead. Pointers in AGENTS.md, CLAUDE.md, and docs/ai-workflow/README.md updated to match. Refs: Project #9 "AI Review" board
|
No TODO comments were found. |
Minimum allowed line rate is |
There was a problem hiding this comment.
♻️ Duplicate comments (2)
.claude/agents/beutl-board-task-runner.md (1)
48-54: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winKeep
manual-verificationblocked for production changes.This still lets
src/changes exit as a valid PR without NUnit coverage, which bypasses the repo rule for new logic. Based on learnings, new logic must ship with a NUnit test in the matchingtests/project.Also applies to: 60-64, 162-177
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/agents/beutl-board-task-runner.md around lines 48 - 54, The guidance in the task-runner doc still allows production changes to exit as manual-verification without NUnit coverage, which conflicts with the repo rule. Update the instructions around the baseline-first loop and the Step-8 gate so that any change touching src/ must add a matching test under tests/ and only non-testable UI cases can use manual-verification with a required manual_verification_note. Use the surrounding policy text in beutl-board-task-runner.md to keep the wording consistent.Source: Learnings
.claude/skills/beutl-resolve-reviews/SKILL.md (1)
62-63: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winPaginate nested thread comments too.
This is the same bug flagged in the prior review:
comments(first:50)can miss the latest reply on longer threads, so Step 3 can resolve against stale context.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/beutl-resolve-reviews/SKILL.md around lines 62 - 63, The GraphQL query in the review-resolution workflow only paginates reviewThreads, but nested thread comments are still fetched with a fixed `comments(first:50)`, which can miss newer replies. Update the query used in this step to also paginate each thread’s comments via `pageInfo`/`endCursor`, and make the thread-processing logic follow the cursor until all comments are collected before resolving in Step 3.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In @.claude/agents/beutl-board-task-runner.md:
- Around line 48-54: The guidance in the task-runner doc still allows production
changes to exit as manual-verification without NUnit coverage, which conflicts
with the repo rule. Update the instructions around the baseline-first loop and
the Step-8 gate so that any change touching src/ must add a matching test under
tests/ and only non-testable UI cases can use manual-verification with a
required manual_verification_note. Use the surrounding policy text in
beutl-board-task-runner.md to keep the wording consistent.
In @.claude/skills/beutl-resolve-reviews/SKILL.md:
- Around line 62-63: The GraphQL query in the review-resolution workflow only
paginates reviewThreads, but nested thread comments are still fetched with a
fixed `comments(first:50)`, which can miss newer replies. Update the query used
in this step to also paginate each thread’s comments via `pageInfo`/`endCursor`,
and make the thread-processing logic follow the cursor until all comments are
collected before resolving in Step 3.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6af7acf2-75eb-4919-9c3c-59ae18b99ee4
📒 Files selected for processing (8)
.claude/agents/beutl-board-task-runner.md.claude/skills/beutl-board-task/SKILL.md.claude/skills/beutl-loop/SKILL.md.claude/skills/beutl-resolve-reviews/SKILL.mdAGENTS.mdCLAUDE.mddocs/ai-workflow/README.mddocs/ai-workflow/loop-engineering.md
✅ Files skipped from review due to trivial changes (4)
- docs/ai-workflow/README.md
- CLAUDE.md
- AGENTS.md
- docs/ai-workflow/loop-engineering.md
🚧 Files skipped from review as they are similar to previous changes (2)
- .claude/skills/beutl-board-task/SKILL.md
- .claude/skills/beutl-loop/SKILL.md
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee7c8e127e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - **Bots only.** Auto-address / auto-resolve only feedback from the known bot reviewers. **Any human | ||
| review or comment → set `needs_human` and leave the thread open**; do not edit code for it and do | ||
| not resolve it. (A human's unresolved thread keeps the PR off the auto-merge path, which is the |
There was a problem hiding this comment.
Do not classify approvals as needing human action
With required code-owner review, the successful path to auto-merge includes a non-author human APPROVED review. This rule turns any human review/comment into needs_human, and the loop's merge gate later requires needs_human == false, so the first approval that satisfies the ruleset also forces the PR down the human path forever. Treat pure approvals/praise/no-request reviews as non-blocking and reserve needs_human for human comments or change requests that need action.
Useful? React with 👍 / 👎.
| 1. **Verify it is NOT a false positive** against the *current* code. If it is, set its Status to | ||
| `False positive` (`e6ff360e`) and return with `false_positive: true` (do nothing else). | ||
| 2. **Claim it** immediately (Status → `In Progress` `47fc9ee4`). | ||
| 3. **Branch off `origin/main`** first (before editing), named `$BRANCH_PREFIX/<descriptive-slug>`. |
There was a problem hiding this comment.
Fetch main before branching each loop item
In a multi-item/headless run, the local origin/main ref is not updated after another PR (including one the loop just merged) lands, and this step branches every new item directly from that ref. Without an explicit git fetch origin main before the recent-merge scan/branch, later PRs can be based on stale main and miss or reintroduce the just-merged Project #9 changes, leading to conflicts/red CI.
Useful? React with 👍 / 👎.
| - `touched_public_api` — changed a public type/member in `Beutl.Engine`, `Beutl.Extensibility`, | ||
| `Beutl.NodeGraph`, `Beutl.FFmpegIpc`, or `Beutl.ProjectSystem`. | ||
| - `touched_gpl_mit` — touched the GPL/MIT boundary. |
There was a problem hiding this comment.
Include all design-review public surfaces
design_reviewer_required is derived from this narrower project list, but the committed beutl-design-reviewer scope also covers public changes in src/Beutl.Api/, src/Beutl.Controls/, new abstractions anywhere, and compat shims (.claude/agents/beutl-design-reviewer.md:18-20). A public API change in those omitted surfaces can therefore skip the draft design pass and remain eligible for the low/moderate auto-merge path without the required design review; mirror the reviewer scope here.
Useful? React with 👍 / 👎.
| - Commit addressed changes (`fix(review): …`) and `git push` the existing PR branch. Never | ||
| force-push; never push `main`. |
There was a problem hiding this comment.
Check out the PR head before pushing review fixes
When /beutl-loop invokes this resolver after Dispatch A, the current checkout is still the orchestrator branch, not the PR branch created in the worker's isolated worktree. This git push assumes review edits were committed on the existing PR branch, but Step 1 only reads headRefName and never checks it out, so any auto-addressed bot comment can be committed/pushed to the wrong branch or fail to update the PR; fetch/check out the PR head before Step 5 edits.
Useful? React with 👍 / 👎.
| gh api "repos/$OWNER_REPO/pulls/<PR>/comments/<comment_id>/replies" -f body="Done — <one-line fix>." | ||
| gh api graphql -f query='mutation{resolveReviewThread(input:{threadId:"<THREAD_ID>"}){thread{isResolved}}}' |
There was a problem hiding this comment.
Handle issue comments outside review-thread replies
Step 2 also fetches general PR comments via the Issues comments API, but this reply/resolve path is only valid for pull request review comments (GitHub documents regular PR comments separately: https://docs.github.com/rest/issues/comments). If an actionable bot posts a top-level PR comment, --auto can try to reply to /pulls/.../comments/<id>/replies and resolve a nonexistent review thread, so branch issue comments to an issue-comment response path and do not treat them as resolvable review threads.
Useful? React with 👍 / 👎.
| - `false_positive: true` → the runner already marked the board item `False positive`, so the item | ||
| **leaves the queue** — this is **progress**: reset `consecutive_no_progress` to 0, increment | ||
| `consecutive_false_positives`. Continue. |
There was a problem hiding this comment.
Clear stale failure signatures after progress
When a systemic block sets last_failure_signature and the next item is a false positive, this progress path resets consecutive_no_progress but leaves that signature intact. A later unrelated item that fails with the same first-error line can then trip the Step 0 "back-to-back" repeat guard even though a progress tick occurred in between; clear the stored failure signature on progress/neutral outcomes or compare only against the immediately previous tick.
Useful? React with 👍 / 👎.
| `BEUTL_LOOP_MAX_ITEMS`, default 50) · wall-clock exceeded · `consecutive_no_progress ≥ 3` **and** no | ||
| PR was opened in the last 3 ticks (`items_processed − last_pr_tick > 3` — recent shipping counts as | ||
| progress and holds the breaker open) · `consecutive_false_positives ≥ 3` (the board/selection is |
There was a problem hiding this comment.
Stop after the documented third no-progress tick
With last_pr_tick initialized to 0, the condition items_processed − last_pr_tick > 3 is still false after exactly three no-progress ticks (3 > 3), so the loop runs a fourth item before the advertised three-strike breaker fires; the same off-by-one occurs after any later PR. Use >= 3 or update the documented stop condition so a run does not continue past the intended stagnation budget.
Useful? React with 👍 / 👎.
| The runner handed back a pushed **draft branch** instead of a PR because the change is design-sensitive | ||
| (`design_reviewer_required`). Run up to **two** rework iterations: | ||
|
|
||
| 1. Dispatch **`@beutl-design-reviewer`** (read-only) on `git diff origin/main...<draft_branch>`. |
There was a problem hiding this comment.
Fetch draft branches before design review
The design-sensitive runner pushes draft_branch from its isolated worktree, but the orchestrator's checkout does not automatically have a local ref with that name. In that case git diff origin/main...<draft_branch> fails before @beutl-design-reviewer can run, blocking every design-sensitive item; fetch the pushed branch and diff against origin/<draft_branch> (or otherwise create the local ref) before dispatching the reviewer/rework pass.
Useful? React with 👍 / 👎.
| 5. **Test with the baseline-first loop** and **gate on the exit code, never a console string** | ||
| (the locale trap is real — `Passed!`/`Failed!` vs `成功!`/`失敗!`). **A production change must ship | ||
| an NUnit test** (AGENTS.md rule #3): if your diff touches `src/` but adds no file under `tests/`, | ||
| go back and add one. Only when the change genuinely cannot carry a unit test (typically UI) do you | ||
| set `test_status: "manual-verification"` — and then a concrete `manual_verification_note` is |
There was a problem hiding this comment.
Run a build gate before opening PRs
This normal Dispatch A path gates on tests and formatting but never runs dotnet build, even though the loop contract says build is a binary gate before PR creation. For changes not covered by the selected test project (for example Windows-only targets or non-tested projects), the runner can open a PR that does not compile and only discover it in CI; add a build step to the pre-commit gate before push/PR.
Useful? React with 👍 / 👎.
Description
Introduces loop engineering — a committed
/beutl-looplayer that runs Beutl's existing single-item board-task flow as an autonomous, guardrailed outer loop. Beutl already had every ingredient (durable memory, verification gates, a work queue, a one-shot task runner); this adds the missing outer driver plus an explicit contract and guardrails./beutl-loop(new skill): orchestrator. Drains the Project #9 board by default (until-empty; runaway backstopBEUTL_LOOP_MAX_ITEMS=50), dispatches a worktree-isolated sub-agent per tick to keep its own context lean, autonomously resolves the PR's bot reviews, classifies risk, then attempts a squash merge for low-to-moderate-risk PRs or leaves higher-risk ones for a human.beutl-board-task-runner(new agent): implements one item → PR in an isolated worktree, returns a stable structured JSON result (same keys every outcome; PR fieldsnullwhen absent); never merges.beutl-resolve-reviews(new skill): addresses + resolves PR reviews.--autohandles bot feedback only (CodeRabbit / Copilot / Codex / Claude) and escalates any human review to the human; re-verifies build/test/format after every edit and never pushes red..claude/scripts/beutl-loop.sh(new): conservative headless launcher — scoped--allowedTools(no bareBash), no--dangerously-skip-permissionsby default, refusesmain/ detached / flat branches (useBEUTL_LOOP_BRANCH_PREFIX).docs/ai-workflow/loop-engineering.md(new): the 6-part loop contract (TRIGGER/SCOPE/ACTION/BUDGET/STOP/REPORT), the moderate-risk auto-merge policy, and the anti-loopmaxxing guards (deterministic STOP, stagnation circuit-breaker, hard budget).beutl-ai-review-task→beutl-board-taskand drop its non-feature/low-risk selection bias (features and higher-risk items are now in scope; risk is gated at merge time).AGENTS.md,CLAUDE.md,docs/ai-workflow/{README,subagents-and-hooks}.md.Safety model:
mainis protected by GitHub rulesets (required checksbuild/dotnet-format, code-owner review via* @yuto-trd, review-thread resolution, squash-only, signed commits, linear history), so GitHub — not the loop — is the hard merge gate. The loop self-gates as defense-in-depth and fails safe to the human; under the current code-owner rule it usually prepares PRs for the code owner to merge rather than merging them itself. It never merges a high-risk PR, force-pushesmain, or bypasses the rulesets.Affected areas
AI-workflow tooling under
.claude/+docs/ai-workflow/. No product/runtime code; no.csproj,.github/workflows/,.claude/settings.json, or.gitignorechanges (the loop's run journal lives under the already-ignored.claude/logs/).Breaking changes
None for compiled code or out-of-tree plugins. The internal AI-workflow skill
beutl-ai-review-taskis renamed tobeutl-board-task(no in-repo content referenced the old slug; the directory move + self-references are updated here).Test plan
No compilable C# is added (skills / agent / docs / bash only), so AGENTS.md rule #3 (NUnit) does not apply. Verified:
bash -n .claude/scripts/beutl-loop.sh; argument validation rejects0/00/ non-integers, refusesmain/ detached / flat branches withoutBEUTL_LOOP_BRANCH_PREFIX, andBEUTL_LOOP_YOLOrequires an explicitN≤3(cannot drain).beutl-ai-review-taskreferences, no stale budget / branch-protection wording, scoped headless allowlist (no bareBash), unified runner JSON contract.mainrulesets).Manual smoke (run intentionally — opens a real PR):
/beutl-loop dry-run 1(side-effect-free) first, then/beutl-loop 1to confirm one item → PR → bot-review resolution → merge attempt (left for the code owner under current rulesets), honoring the budget STOP.Fixed issues / References
Summary by CodeRabbit
/beutl-loopautonomous board-draining loop workflow with bounded execution, risk-aware PR handling, and safe handoff to humans when needed.