Skip to content

feat: add /beutl-loop autonomous board loop (loop engineering)#2018

Open
yuto-trd wants to merge 2 commits into
mainfrom
yuto-trd/loop-engineering
Open

feat: add /beutl-loop autonomous board loop (loop engineering)#2018
yuto-trd wants to merge 2 commits into
mainfrom
yuto-trd/loop-engineering

Conversation

@yuto-trd

@yuto-trd yuto-trd commented Jun 25, 2026

Copy link
Copy Markdown
Member

Description

Introduces loop engineering — a committed /beutl-loop layer that runs Beutl's existing single-item board-task flow as an autonomous, guardrailed outer loop. Beutl already had every ingredient (durable memory, verification gates, a work queue, a one-shot task runner); this adds the missing outer driver plus an explicit contract and guardrails.

  • /beutl-loop (new skill): orchestrator. Drains the Project #9 board by default (until-empty; runaway backstop BEUTL_LOOP_MAX_ITEMS=50), dispatches a worktree-isolated sub-agent per tick to keep its own context lean, autonomously resolves the PR's bot reviews, classifies risk, then attempts a squash merge for low-to-moderate-risk PRs or leaves higher-risk ones for a human.
  • beutl-board-task-runner (new agent): implements one item → PR in an isolated worktree, returns a stable structured JSON result (same keys every outcome; PR fields null when absent); never merges.
  • beutl-resolve-reviews (new skill): addresses + resolves PR reviews. --auto handles bot feedback only (CodeRabbit / Copilot / Codex / Claude) and escalates any human review to the human; re-verifies build/test/format after every edit and never pushes red.
  • .claude/scripts/beutl-loop.sh (new): conservative headless launcher — scoped --allowedTools (no bare Bash), no --dangerously-skip-permissions by default, refuses main / detached / flat branches (use BEUTL_LOOP_BRANCH_PREFIX).
  • docs/ai-workflow/loop-engineering.md (new): the 6-part loop contract (TRIGGER/SCOPE/ACTION/BUDGET/STOP/REPORT), the moderate-risk auto-merge policy, and the anti-loopmaxxing guards (deterministic STOP, stagnation circuit-breaker, hard budget).
  • Rename beutl-ai-review-taskbeutl-board-task and drop its non-feature/low-risk selection bias (features and higher-risk items are now in scope; risk is gated at merge time).
  • Pointers added to AGENTS.md, CLAUDE.md, docs/ai-workflow/{README,subagents-and-hooks}.md.

Safety model: main is protected by GitHub rulesets (required checks build/dotnet-format, code-owner review via * @yuto-trd, review-thread resolution, squash-only, signed commits, linear history), so GitHub — not the loop — is the hard merge gate. The loop self-gates as defense-in-depth and fails safe to the human; under the current code-owner rule it usually prepares PRs for the code owner to merge rather than merging them itself. It never merges a high-risk PR, force-pushes main, or bypasses the rulesets.

Affected areas

  • Build / CI / docs only

AI-workflow tooling under .claude/ + docs/ai-workflow/. No product/runtime code; no .csproj, .github/workflows/, .claude/settings.json, or .gitignore changes (the loop's run journal lives under the already-ignored .claude/logs/).

Breaking changes

None for compiled code or out-of-tree plugins. The internal AI-workflow skill beutl-ai-review-task is renamed to beutl-board-task (no in-repo content referenced the old slug; the directory move + self-references are updated here).

Test plan

No compilable C# is added (skills / agent / docs / bash only), so AGENTS.md rule #3 (NUnit) does not apply. Verified:

  • bash -n .claude/scripts/beutl-loop.sh; argument validation rejects 0 / 00 / non-integers, refuses main / detached / flat branches without BEUTL_LOOP_BRANCH_PREFIX, and BEUTL_LOOP_YOLO requires an explicit N≤3 (cannot drain).
  • YAML frontmatter parses for all new/renamed skills and the new agent.
  • Consistency sweep: no stale beutl-ai-review-task references, no stale budget / branch-protection wording, scoped headless allowlist (no bare Bash), unified runner JSON contract.
  • Reviewed across three independent Codex passes; all actionable findings addressed (no false-merge path under the main rulesets).

Manual smoke (run intentionally — opens a real PR): /beutl-loop dry-run 1 (side-effect-free) first, then /beutl-loop 1 to confirm one item → PR → bot-review resolution → merge attempt (left for the code owner under current rulesets), honoring the budget STOP.

Fixed issues / References

  • Project board: b-editor/projects/9 ("AI Review")

Summary by CodeRabbit

  • New Features
    • Added the /beutl-loop autonomous board-draining loop workflow with bounded execution, risk-aware PR handling, and safe handoff to humans when needed.
    • Added support docs for running single board tasks, resolving PR reviews, and orchestrating the full “implement → PR/draft → settle reviews → merge if eligible” flow.
  • Documentation
    • Expanded AI workflow documentation with the new loop engineering contract, updated usage guidance, and refreshed agent/skill instructions.
    • Added a headless launcher script and updated command references to reflect the new loop per-tick workflow and guardrails.

Introduce a committed "loop engineering" layer that drives Beutl's existing
single-item board-task flow as an autonomous, guardrailed outer loop.

- New skill `/beutl-loop`: orchestrator that drains the Project #9 board by
  default (until-empty; runaway backstop BEUTL_LOOP_MAX_ITEMS=50), dispatching a
  worktree-isolated sub-agent per tick to keep its own context lean, then
  risk-gates a squash merge (low-to-moderate-risk) or leaves the PR for a human.
- New agent `beutl-board-task-runner`: implements one item -> PR in an isolated
  worktree and returns a stable structured JSON result; never merges.
- New skill `beutl-resolve-reviews`: addresses + resolves PR reviews; `--auto`
  handles bot feedback only (CodeRabbit/Copilot/Codex/Claude) and escalates any
  human review to the human.
- New `.claude/scripts/beutl-loop.sh`: conservative headless launcher (scoped
  allowlist, no --dangerously-skip-permissions by default, refuses
  main/detached/flat branches).
- New guide `docs/ai-workflow/loop-engineering.md`: the 6-part loop contract, the
  moderate-risk auto-merge policy, and the anti-loopmaxxing guards.
- Rename skill `beutl-ai-review-task` -> `beutl-board-task` and drop its
  non-feature/low-risk selection bias (features and higher-risk items are now in
  scope; risk is handled at merge time).
- Wire pointers into AGENTS.md, CLAUDE.md, and
  docs/ai-workflow/{README,subagents-and-hooks}.md.

Merge safety is anchored on GitHub's `main` rulesets (required checks
build/dotnet-format, code-owner review via * @yuto-trd, thread resolution,
squash-only, signed commits); the loop self-gates as defense-in-depth and fails
safe to the human when unsure.

Refs: Project #9 "AI Review" board
Copilot AI review requested due to automatic review settings June 25, 2026 03:42
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a Project #9 board loop surface: a one-item worktree runner, a /beutl-loop orchestrator and launcher, PR review-resolution automation, and matching documentation updates.

Changes

Project #9 board loop automation

Layer / File(s) Summary
Board item selection
.claude/skills/beutl-board-task/SKILL.md
Expands the Project #9 task-selection skill to accept any item kind, keep only unclaimed board items, and add blocked/manual-verification handling for unsupported cases.
Runner execution contract
.claude/agents/beutl-board-task-runner.md
Defines the worktree runner metadata, one-item execution flow, risk-signal output, JSON result shape, and hard constraints around merges, reviews, and signed commits.
Review resolution flow
.claude/skills/beutl-resolve-reviews/SKILL.md
Defines interactive and --auto PR review handling, thread classification, verification steps, reply-and-resolve behavior, and the structured auto-report.
Loop orchestration
.claude/skills/beutl-loop/SKILL.md
Specifies the bounded /beutl-loop contract, per-tick dispatch flow, dry-run mode, auto-merge policy, run journal fields, and stop/report rules.
Headless launcher
.claude/scripts/beutl-loop.sh
Adds the Bash launcher that validates budgets and branches, applies allowlist or YOLO permissions, creates logs, and invokes claude -p "/beutl-loop ...".
Docs and entry points
AGENTS.md, CLAUDE.md, docs/ai-workflow/README.md, docs/ai-workflow/subagents-and-hooks.md
Updates the repository guidance and index pages to mention /beutl-loop, the new subagent, the launcher directory, and the loop reference guide.
Loop design guide
docs/ai-workflow/loop-engineering.md
Describes the loop contract, tick flow, merge policy, review automation, journal behavior, launcher safety envelope, and verification steps.

Sequence Diagram(s)

sequenceDiagram
  participant beutlLoop as beutl-loop
  participant boardTaskRunner as beutl-board-task-runner
  participant resolveReviews as beutl-resolve-reviews
  participant githubRulesets as GitHub rulesets

  beutlLoop->>boardTaskRunner: run one board item in a worktree
  boardTaskRunner-->>beutlLoop: return JSON result
  beutlLoop->>resolveReviews: resolve bot review comments with --auto
  resolveReviews-->>beutlLoop: return thread status and needs_human
  beutlLoop->>githubRulesets: attempt squash merge
  githubRulesets-->>beutlLoop: allow or refuse merge
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I hopped through /beutl-loop with a twitchy nose,
then nibbled PR reviews in tidy rows.
One board item, one worktree, hop-hop done,
with logs in paw beneath the sun.
Safe merges jingle like carrot bells.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main addition: an autonomous /beutl-loop board loop and its engineering docs.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch yuto-trd/loop-engineering

Comment @coderabbitai help to get the list of available commands.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces “loop engineering” to Beutl’s AI workflow by adding a new autonomous /beutl-loop skill that can drain Project #9 items in a bounded, guardrailed loop, supported by a per-tick worktree sub-agent and a dedicated skill for resolving bot PR reviews. It updates core workflow docs (AGENTS/CLAUDE/docs) to document the contract, safety model, and how to use the new loop tooling.

Changes:

  • Add /beutl-loop orchestrator skill, plus a conservative headless launcher script (.claude/scripts/beutl-loop.sh).
  • Add beutl-board-task-runner worktree-isolated sub-agent for “one item → PR” execution, and beutl-resolve-reviews skill for resolving bot review feedback.
  • Update workflow documentation to reflect the new loop contract, guardrails, and expanded Project #9 task scope via beutl-board-task.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
docs/ai-workflow/subagents-and-hooks.md Documents the new per-tick runner agent and the board-loop skill surface.
docs/ai-workflow/README.md Updates AI workflow entry points and adds /beutl-loop + related tooling references.
docs/ai-workflow/loop-engineering.md New guide defining the loop contract, guardrails, and risk-gated merge policy.
CLAUDE.md Adds a concise explanation of /beutl-loop behavior and safety/authorization model.
AGENTS.md Adds a top-level “Autonomous board loop” entry and links to loop docs.
.claude/skills/beutl-resolve-reviews/SKILL.md New skill for addressing and resolving PR review feedback (bot-only autonomy in --auto).
.claude/skills/beutl-loop/SKILL.md New orchestrator skill defining the bounded board-draining loop procedure.
.claude/skills/beutl-board-task/SKILL.md Broadens single-item task execution to include features (risk gated downstream).
.claude/scripts/beutl-loop.sh New headless launcher enforcing conservative permission and branch-safety constraints.
.claude/agents/beutl-board-task-runner.md New worktree-isolated agent to implement one Project #9 item and open a PR with structured JSON output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +10
---
description: |
Fetch every review and review comment on a pull request (CodeRabbit, GitHub Copilot, Codex,
Claude code-review, and humans), decide which genuinely need a code change, address the clearly
actionable ones, then reply and resolve the threads. Two modes: the default is human-in-the-loop
(confirm each comment via AskUserQuestion, like the global handle-pr-reviews skill); pass `--auto`
for autonomous resolution — used by /beutl-loop after it opens a PR. Use when the user says
"PRレビューに対応", "レビュー指摘を反映してResolve", "address and resolve the PR reviews".
argument-hint: "[--auto] [reply | no-resolve] [PR#]"
---

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/agents/beutl-board-task-runner.md:
- Around line 43-45: Treat the manual-verification path in the task runner as
blocked rather than completed: update the guidance around the baseline-first
loop so that `manual-verification` does not allow in-scope changes to pass
without tests, and require a matching NUnit test in the corresponding `tests/`
project for new logic. Adjust the `test_status`/verification wording in the
task-runner docs so only testable work can be marked done, while UI or
hard-to-unit-test work remains blocked until covered or the scope changes.

In @.claude/scripts/beutl-loop.sh:
- Around line 33-38: The beutl-loop launcher currently only validates $1 in
beutl-loop.sh and silently ignores any extra arguments, so calls with a second
filter argument are not honored. Update the argument handling around the BUDGET
case to either parse and pass through the supported kind filter from the
launcher into the /beutl-loop invocation path, or explicitly reject unexpected
extra args with a clear error; use the existing beutl-loop.sh entrypoint and the
advertised SKILL.md optional kind filter as the reference points.

In @.claude/skills/beutl-board-task/SKILL.md:
- Around line 76-82: The selection guidance in SKILL.md currently allows
hard-to-unit-test work to proceed with only manual verification, which conflicts
with the test-first rule. Update the wording around the two guards on selection
so that new logic in the relevant workflow must land with an NUnit test in the
matching tests project; if automation is not possible, treat the item as blocked
instead of substituting manual verification. Keep the blocked/stagnation
handling aligned with the existing selection rules and adjust the surrounding
guidance text accordingly.
- Around line 64-66: The candidate filtering in the board-claim flow still has a
TOCTOU gap: even after selecting unclaimed items, another actor can move the
item to In Progress before the update. Update the claim path in the relevant
board mutation logic to re-check the item status immediately before claiming,
and handle an atomic-claim failure or stale-status rejection gracefully so
already-claimed items are not duplicated.

In @.claude/skills/beutl-loop/SKILL.md:
- Around line 155-158: The unresolved-thread check in the merge loop only
queries reviewThreads(first:100), so it can miss open threads past the first
page. Update the check to paginate through all review threads using pageInfo and
after in the gh api GraphQL query, and keep counting unresolved items until
every page has been processed. Use the existing UNRESOLVED logic in the loop to
ensure the merge gate only passes when all review threads are resolved.

In @.claude/skills/beutl-resolve-reviews/SKILL.md:
- Around line 61-63: The GraphQL query in the review-thread lookup only fetches
comments(first:50), so Step 3 can miss the true latest reply on long threads and
use stale context. Update the thread-fetching logic around the reviewThreads
query to paginate the nested comments connection as well, or otherwise ensure
the full comment history is retrieved before resolving threads; use the
reviewThreads/comments structure and databaseId-to-thread mapping as the key
symbols to locate the change.

In `@docs/ai-workflow/loop-engineering.md`:
- Around line 200-201: The dry-run example uses an argument order that does not
match the documented `/beutl-loop` command forms. Update the example in the
loop-engineering guide to use one of the supported shapes from
`beutl-loop/SKILL.md`—for example `/beutl-loop dry-run` with an optional kind
filter—or expand the skill contract first if a numeric dry-run budget is
intended. Keep the example aligned with the `dry-run` invocation syntax so the
docs and `SKILL.md` agree.
- Around line 155-156: The wrapped Project `#9` sentence in the markdown doc is
triggering markdownlint MD018 because `#9` appears at the start of a line.
Reflow the text in the surrounding paragraph so `#9` is not at column 1, or
format it as inline code, keeping the sentence within the existing docs section
that mentions the renamed `beutl-board-task` / `beutl-ai-review-task` note.
- Line 202: The current validation step only performs a syntax check with bash
-n on beutl-loop.sh, so it cannot prove the main-branch guard works. Update the
loop-engineering.md instructions to separate this into two checks: keep the bash
-n parse verification for the script, and add a real execution of the
beutl-loop.sh path that exercises the branch check and confirms it refuses to
run on main. Use the existing beutl-loop.sh reference and the validation list
item in loop-engineering.md so the runtime refusal is explicitly demonstrated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: dd54d2f5-ca75-42dd-a5df-66addf5a837a

📥 Commits

Reviewing files that changed from the base of the PR and between a24a93c and d07ad5a.

📒 Files selected for processing (10)
  • .claude/agents/beutl-board-task-runner.md
  • .claude/scripts/beutl-loop.sh
  • .claude/skills/beutl-board-task/SKILL.md
  • .claude/skills/beutl-loop/SKILL.md
  • .claude/skills/beutl-resolve-reviews/SKILL.md
  • AGENTS.md
  • CLAUDE.md
  • docs/ai-workflow/README.md
  • docs/ai-workflow/loop-engineering.md
  • docs/ai-workflow/subagents-and-hooks.md

Comment thread .claude/agents/beutl-board-task-runner.md Outdated
Comment on lines +33 to +38
BUDGET="${1:-until-empty}"
case "$BUDGET" in
until-empty) : ;;
''|*[!0-9]*) echo "argument must be 'until-empty' or a positive integer (got '$BUDGET')" >&2; exit 2 ;;
*) [ "$BUDGET" -gt 0 ] 2>/dev/null || { echo "item budget must be a positive integer (got '$BUDGET')" >&2; exit 2; } ;;
esac

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Reject or forward extra /beutl-loop arguments.

The launcher only reads $1, so a call like .claude/scripts/beutl-loop.sh 1 feature silently degrades to /beutl-loop 1. Since .claude/skills/beutl-loop/SKILL.md:1-13 advertises an optional kind filter, this widens unattended scope instead of honoring the caller’s constraint. Either parse and forward the supported second argument or fail fast on any extra args.

Suggested fail-fast fix
+if [ "$#" -gt 1 ]; then
+  echo "usage: .claude/scripts/beutl-loop.sh [until-empty | N]" >&2
+  echo "unexpected extra arguments: this launcher does not forward additional /beutl-loop args." >&2
+  exit 2
+fi
+
 # Default to draining the board; an integer arg sets a tighter budget.
 BUDGET="${1:-until-empty}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
BUDGET="${1:-until-empty}"
case "$BUDGET" in
until-empty) : ;;
''|*[!0-9]*) echo "argument must be 'until-empty' or a positive integer (got '$BUDGET')" >&2; exit 2 ;;
*) [ "$BUDGET" -gt 0 ] 2>/dev/null || { echo "item budget must be a positive integer (got '$BUDGET')" >&2; exit 2; } ;;
esac
if [ "$#" -gt 1 ]; then
echo "usage: .claude/scripts/beutl-loop.sh [until-empty | N]" >&2
echo "unexpected extra arguments: this launcher does not forward additional /beutl-loop args." >&2
exit 2
fi
BUDGET="${1:-until-empty}"
case "$BUDGET" in
until-empty) : ;;
''|*[!0-9]*) echo "argument must be 'until-empty' or a positive integer (got '$BUDGET')" >&2; exit 2 ;;
*) [ "$BUDGET" -gt 0 ] 2>/dev/null || { echo "item budget must be a positive integer (got '$BUDGET')" >&2; exit 2; } ;;
esac
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/scripts/beutl-loop.sh around lines 33 - 38, The beutl-loop launcher
currently only validates $1 in beutl-loop.sh and silently ignores any extra
arguments, so calls with a second filter argument are not honored. Update the
argument handling around the BUDGET case to either parse and pass through the
supported kind filter from the launcher into the /beutl-loop invocation path, or
explicitly reject unexpected extra args with a clear error; use the existing
beutl-loop.sh entrypoint and the advertised SKILL.md optional kind filter as the
reference points.

Comment on lines +64 to +66
**Candidate filter.** Only **unclaimed** items are candidates: `Backlog` and `Todo`. Skip `Done` /
`False positive`, and skip `In Progress` too — on this board `In Progress` means another
agent/contributor has already claimed it, so treating it as a candidate invites duplicate work.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Revalidate before claiming.

Filtering on the snapshot alone still leaves a TOCTOU gap: another run or a human can move the item to In Progress before the board update. Add a final status check or atomic-claim failure path right before mutating the board.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/beutl-board-task/SKILL.md around lines 64 - 66, The candidate
filtering in the board-claim flow still has a TOCTOU gap: even after selecting
unclaimed items, another actor can move the item to In Progress before the
update. Update the claim path in the relevant board mutation logic to re-check
the item status immediately before claiming, and handle an atomic-claim failure
or stale-status rejection gracefully so already-claimed items are not
duplicated.

Comment thread .claude/skills/beutl-board-task/SKILL.md
Comment on lines +155 to +158
# Unresolved review threads must be 0 (assumes ≤100; paginate for more — GitHub also enforces this,
# so the loop only checks to avoid a futile merge attempt):
UNRESOLVED=$(gh api graphql -f query='query{repository(owner:"b-editor",name:"beutl"){pullRequest(number:'"$PR"'){reviewThreads(first:100){nodes{isResolved}}}}}' \
--jq '[.data.repository.pullRequest.reviewThreads.nodes[]|select(.isResolved==false)]|length')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n## SKILL slice\n'
wc -l .claude/skills/beutl-loop/SKILL.md
sed -n '145,170p' .claude/skills/beutl-loop/SKILL.md

printf '\n## Loop engineering doc slice\n'
if [ -f docs/ai-workflow/loop-engineering.md ]; then
  wc -l docs/ai-workflow/loop-engineering.md
  sed -n '65,95p' docs/ai-workflow/loop-engineering.md
else
  echo 'docs/ai-workflow/loop-engineering.md not found'
fi

printf '\n## Search reviewThreads pagination usage\n'
rg -n "reviewThreads|isResolved|pageInfo|hasNextPage" .claude/skills/beutl-loop docs -g '!**/*.png' -g '!**/*.jpg' -g '!**/*.gif'

Repository: b-editor/beutl

Length of output: 4797


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '112,124p' docs/ai-workflow/loop-engineering.md

Repository: b-editor/beutl

Length of output: 1199


Paginate the unresolved-thread check.
reviewThreads(first:100) can miss unresolved threads beyond the first page, so the merge gate can treat a PR as settled when it still has open review threads. Use pageInfo/after until all threads are counted.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/beutl-loop/SKILL.md around lines 155 - 158, The
unresolved-thread check in the merge loop only queries reviewThreads(first:100),
so it can miss open threads past the first page. Update the check to paginate
through all review threads using pageInfo and after in the gh api GraphQL query,
and keep counting unresolved items until every page has been processed. Use the
existing UNRESOLVED logic in the loop to ensure the merge gate only passes when
all review threads are resolved.

Comment on lines +61 to +63
# Thread ids + resolved state (to resolve later); map databaseId -> thread id.
# first:100 covers all but pathological PRs; if `pageInfo.hasNextPage` is true, follow the cursor.
gh api graphql -f query='query{repository(owner:"b-editor",name:"beutl"){pullRequest(number:<PR>){reviewThreads(first:100){pageInfo{hasNextPage endCursor} nodes{id isResolved comments(first:50){nodes{databaseId author{login} path line body}}}}}}}'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the referenced skill file and search for the step-3 logic that uses comment order.
git ls-files '.claude/skills/beutl-resolve-reviews/SKILL.md' '**/*.md' '**/*.ts' '**/*.js' '**/*.py' | sed -n '1,200p'
echo '--- FILE ---'
cat -n .claude/skills/beutl-resolve-reviews/SKILL.md | sed -n '1,180p'
echo '--- SEARCH ---'
rg -n --hidden --no-ignore-vcs -S "latest comment|Step 3|comments\\(first:50\\)|reviewThreads|databaseId -> thread id|resolve later" .claude . 2>/dev/null | sed -n '1,200p'

Repository: b-editor/beutl

Length of output: 18522


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the exact file and inspect nearby references if present.
fd -a 'SKILL.md' .claude

Repository: b-editor/beutl

Length of output: 1654


Paginate nested thread comments. Step 3 depends on the latest reply in a thread, but this query only loads comments(first:50). On longer threads, the real latest reply can fall past that cap and the resolver will act on stale context.

Paginate the nested comments connection, or document why a thread can never exceed 50 comments.

🧰 Tools
🪛 SkillSpector (2.2.3)

[info] 107: [EA3] Scope Creep: Skill's behavior or capabilities extend beyond its stated purpose. Scope creep allows an agent to perform actions unrelated to its documented functionality, increasing the attack surface.

Remediation: Limit the skill's scope to its documented purpose. Remove instructions that enable the agent to perform actions outside its stated functionality.

(Excessive Agency (EA3))

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/beutl-resolve-reviews/SKILL.md around lines 61 - 63, The
GraphQL query in the review-thread lookup only fetches comments(first:50), so
Step 3 can miss the true latest reply on long threads and use stale context.
Update the thread-fetching logic around the reviewThreads query to paginate the
nested comments connection as well, or otherwise ensure the full comment history
is retrieved before resolving threads; use the reviewThreads/comments structure
and databaseId-to-thread mapping as the key symbols to locate the change.

Comment thread docs/ai-workflow/loop-engineering.md
Comment thread docs/ai-workflow/loop-engineering.md Outdated
Comment thread docs/ai-workflow/loop-engineering.md

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d07ad5aae4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +65 to +67
ALLOWED_TOOLS="Read Edit Write Grep Glob Task \
Bash(gh:*) Bash(git:*) Bash(dotnet:*) Bash(python3:*) Bash(jq:*) Bash(mktemp:*) Bash(date:*) \
Bash(grep:*) Bash(rg:*) Bash(ls:*) Bash(cat:*) Bash(sed:*) Bash(find:*) Bash(head:*) Bash(tail:*) Bash(wc:*) Bash(test:*)"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow the settle loop to sleep unattended

The /beutl-loop procedure requires polling bot reviews every ~90s and enforcing a ~10-minute quiet period, but the headless launcher says commands outside this --allowedTools set stall unattended and Bash(sleep:*) is not included. In a real headless run, the first attempt to wait between gh polls either prompts/stalls or the loop cannot honor the settle window, so low-risk PRs may never progress unattended; add sleep to this allowlist (and the skill frontmatter allowlist) so the documented polling can run.

Useful? React with 👍 / 👎.

Comment on lines +76 to +78
Group inline comments by `in_reply_to_id`; the **latest** comment in a thread is what matters. Skip:
PR-author comments, pure approvals / praise / 👍, and threads already `isResolved: true` or whose
latest reply says done/fixed/resolved. Keep the `{thread id ↔ comment databaseId}` map for Step 6.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Group review threads by top-level comment id

For PRs with more than one top-level inline review thread, grouping solely by in_reply_to_id collapses all top-level comments under the same null/missing key, so the resolver can inspect only the latest unrelated thread and skip other actionable bot comments. GitHub's REST docs for review-comment replies state that replies are attached to the top-level review comment id, so the grouping key should be in_reply_to_id ?? id (then map that top-level id to the GraphQL thread) rather than in_reply_to_id alone.

Useful? React with 👍 / 👎.

Comment on lines +53 to +59
# Inline (line-anchored, threaded) comments
gh api "repos/$OWNER_REPO/pulls/<PR>/comments" --paginate \
--jq '[.[] | {id, in_reply_to_id, user: .user.login, path, line, body, html_url, diff_hunk}]'

# General PR (issue) comments
gh api "repos/$OWNER_REPO/issues/<PR>/comments" --paginate \
--jq '[.[] | {id, user: .user.login, body, html_url}]'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve comment timestamps for quiet-period checks

The loop relies on last_activity_at to decide whether a PR has been quiet for ~10 minutes, but these comment fetches discard created_at/updated_at; with a bot or human inline/general comment after the last review summary, the resolver cannot report the true newest activity and the orchestrator can treat the PR as settled too early. GitHub's list-review-comments response includes timestamps, so include them for both inline and issue comments (and fetch commit timestamps) before computing last_activity_at.

Useful? React with 👍 / 👎.

enlarge the diff materially:** do **not** touch. Leave the thread open and mark `needs_human`.
- **After any edit, re-verify before pushing — this is mandatory:**
```bash
dotnet build Beutl.slnx -f net10.0 # must be clean

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drop the forced framework from solution builds

When --auto edits any review comment, this mandatory verification runs against the whole solution with -f net10.0, but Beutl.slnx includes src/Beutl.Engine.SourceGenerators and sdk/Beutl.Extensibility.Sdk, which target only netstandard2.0. For those projects, forcing TargetFramework=net10.0 makes the build/restore red even when the review fix is correct, so the resolver will revert or refuse to push every handled comment; use the repo's documented dotnet build Beutl.slnx or build only affected net10.0 projects.

Useful? React with 👍 / 👎.

Comment on lines +37 to +38
gh pr view ${PR:-} --json number,url,headRefName,baseRefName,title,state,mergeable
OWNER_REPO=$(gh repo view --json nameWithOwner -q .nameWithOwner) # b-editor/beutl

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fetch the PR author before filtering comments

The resolver later has to skip PR-author comments and treat only non-author human feedback as needs_human, but this initial PR metadata does not include the author login. On PRs where the author has left an inline/general comment (or a previous resolver reply is authored by the same token), --auto cannot distinguish that from external human feedback and can unnecessarily mark the PR as needing a human even when all bot feedback was handled; include author in this gh pr view result and compare against it during classification.

Useful? React with 👍 / 👎.

Bash(gh:*) Bash(git:*) Bash(dotnet:*) Bash(python3:*) Bash(jq:*) Bash(mktemp:*) Bash(date:*) \
Bash(grep:*) Bash(rg:*) Bash(ls:*) Bash(cat:*) Bash(sed:*) Bash(find:*) Bash(head:*) Bash(tail:*) Bash(wc:*) Bash(test:*)"

PERM_ARGS=(--allowedTools "$ALLOWED_TOOLS")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use --tools for the claimed command sandbox

The script describes this list as a conservative envelope where commands outside it stall, but Claude Code's CLI docs define --allowedTools as tools that run without prompting and explicitly say to use --tools to restrict what is available. In headless runs where project/user settings or auto-mode allow more tools, this does not actually remove bare Bash or other commands from the model's toolbox, so the safety contract documented above the launcher is not enforced; add an explicit --tools/deny policy for the intended tool set.

Useful? React with 👍 / 👎.

Comment on lines +143 to +145
and **`reviewDecision == APPROVED`** — a `REVIEW_REQUIRED` (e.g. code-owner approval still pending,
which is the usual case under `* @yuto-trd`) means **leave for human and do not even attempt the
merge**. **Otherwise high-risk → leave for human.** When unsure, choose human (fail safe).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Permit null reviewDecision when reviews are not required

This gate requires reviewDecision == APPROVED for every auto-merge, but GitHub returns no approved decision for PRs on paths/rulesets where reviews are not required and no human has reviewed. That contradicts the documented case where auto-merge should work once code-owner review is relaxed for some paths: such a low-risk PR can have green checks and be mergeable but still be left for a human forever because reviewDecision is null instead of APPROVED; treat null as acceptable only when the ruleset does not require reviews.

Useful? React with 👍 / 👎.

…ation precision

Follow-up to the loop-engineering layer: add six guardrails to the autonomous
Project #9 board loop, all in the AI-workflow tooling under .claude/ and
docs/ai-workflow/ (no product/runtime code).

- Test gate (B): the runner blocks instead of opening a PR when a production
  change (src/) ships no NUnit test and no manual-verification note. New
  signals touched_production / test_files_added[_count] / manual_verification_note
  drive it; the orchestrator re-checks the same condition as defense-in-depth and
  treats a violating PR as high-risk + no-progress.
- Self-review gate (A): a six-point pre-commit check (compiled XAML bindings, no
  [Obsolete]/v2/compat-overload shim, no leftover TODO/Follow-up, root-cause fix,
  GPL/MIT boundary, subtree CLAUDE.md). Reported via self_review_passed/findings.
- Design two-pass (D): a design-sensitive item is handed back as a pushed draft
  (draft_ready/draft_branch) instead of a PR; the orchestrator runs
  @beutl-design-reviewer with up to two rework iterations (runner Rework mode,
  REWORK/OPEN_PR/design_findings), then opens the PR -- high-risk to a human if
  the design is still unresolved after the budget.
- Recent-merge scan (C): the runner and beutl-board-task skim the last ~15
  "Refs: Project #9" merges to avoid reintroducing a just-removed pattern (a soft
  signal, not a gate).
- Stagnation precision (E): the no-progress breaker is raised 2 -> 3 and held open
  by a PR in the last 3 ticks (journal last_pr_tick); blocked items are split into
  item-specific (skipped, neutral) vs systemic (counts toward the breaker) via a
  new blocked_kind signal; a single blocked item no longer halts the drain (it is
  recorded and reported at the end).
- Bot false-positive reply path (F): beutl-resolve-reviews --auto can refute a
  clear bot false positive with a concrete path:line citation and resolve the
  thread without a code change (false_positives_resolved); an uncertain case
  escalates instead.

Pointers in AGENTS.md, CLAUDE.md, and docs/ai-workflow/README.md updated to match.

Refs: Project #9 "AI Review" board
@github-actions

Copy link
Copy Markdown
Contributor

No TODO comments were found.

@github-actions

Copy link
Copy Markdown
Contributor

Code Coverage

Package Line Rate Branch Rate Complexity Health
Beutl.Api 9% 4% 1171
Beutl.Configuration 41% 22% 350
Beutl.Controls 0% 0% 5513
Beutl.Core 62% 56% 3069
Beutl.Editor 78% 70% 1742
Beutl.Editor.Components 1% 0% 8552
Beutl.Embedding.MediaFoundation 6% 8% 1376
Beutl.Engine 63% 52% 18123
Beutl.Engine.SourceGenerators 59% 44% 540
Beutl.Extensibility 38% 59% 112
Beutl.Extensions.AVFoundation 5% 12% 246
Beutl.Extensions.FFmpeg 6% 7% 853
Beutl.FFmpegIpc 22% 28% 805
Beutl.Language 7% 50% 1348
Beutl.NodeGraph 24% 15% 2474
Beutl.ProjectSystem 58% 43% 1061
Beutl.Threading 100% 90% 137
Beutl.Utilities 94% 87% 358
Summary 35% (43401 / 124630) 31% (10824 / 35488) 47830

Minimum allowed line rate is 0%

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
.claude/agents/beutl-board-task-runner.md (1)

48-54: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Keep manual-verification blocked for production changes.

This still lets src/ changes exit as a valid PR without NUnit coverage, which bypasses the repo rule for new logic. Based on learnings, new logic must ship with a NUnit test in the matching tests/ project.

Also applies to: 60-64, 162-177

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/agents/beutl-board-task-runner.md around lines 48 - 54, The guidance
in the task-runner doc still allows production changes to exit as
manual-verification without NUnit coverage, which conflicts with the repo rule.
Update the instructions around the baseline-first loop and the Step-8 gate so
that any change touching src/ must add a matching test under tests/ and only
non-testable UI cases can use manual-verification with a required
manual_verification_note. Use the surrounding policy text in
beutl-board-task-runner.md to keep the wording consistent.

Source: Learnings

.claude/skills/beutl-resolve-reviews/SKILL.md (1)

62-63: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Paginate nested thread comments too.

This is the same bug flagged in the prior review: comments(first:50) can miss the latest reply on longer threads, so Step 3 can resolve against stale context.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/beutl-resolve-reviews/SKILL.md around lines 62 - 63, The
GraphQL query in the review-resolution workflow only paginates reviewThreads,
but nested thread comments are still fetched with a fixed `comments(first:50)`,
which can miss newer replies. Update the query used in this step to also
paginate each thread’s comments via `pageInfo`/`endCursor`, and make the
thread-processing logic follow the cursor until all comments are collected
before resolving in Step 3.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.claude/agents/beutl-board-task-runner.md:
- Around line 48-54: The guidance in the task-runner doc still allows production
changes to exit as manual-verification without NUnit coverage, which conflicts
with the repo rule. Update the instructions around the baseline-first loop and
the Step-8 gate so that any change touching src/ must add a matching test under
tests/ and only non-testable UI cases can use manual-verification with a
required manual_verification_note. Use the surrounding policy text in
beutl-board-task-runner.md to keep the wording consistent.

In @.claude/skills/beutl-resolve-reviews/SKILL.md:
- Around line 62-63: The GraphQL query in the review-resolution workflow only
paginates reviewThreads, but nested thread comments are still fetched with a
fixed `comments(first:50)`, which can miss newer replies. Update the query used
in this step to also paginate each thread’s comments via `pageInfo`/`endCursor`,
and make the thread-processing logic follow the cursor until all comments are
collected before resolving in Step 3.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6af7acf2-75eb-4919-9c3c-59ae18b99ee4

📥 Commits

Reviewing files that changed from the base of the PR and between d07ad5a and ee7c8e1.

📒 Files selected for processing (8)
  • .claude/agents/beutl-board-task-runner.md
  • .claude/skills/beutl-board-task/SKILL.md
  • .claude/skills/beutl-loop/SKILL.md
  • .claude/skills/beutl-resolve-reviews/SKILL.md
  • AGENTS.md
  • CLAUDE.md
  • docs/ai-workflow/README.md
  • docs/ai-workflow/loop-engineering.md
✅ Files skipped from review due to trivial changes (4)
  • docs/ai-workflow/README.md
  • CLAUDE.md
  • AGENTS.md
  • docs/ai-workflow/loop-engineering.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • .claude/skills/beutl-board-task/SKILL.md
  • .claude/skills/beutl-loop/SKILL.md

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee7c8e127e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +102 to +104
- **Bots only.** Auto-address / auto-resolve only feedback from the known bot reviewers. **Any human
review or comment → set `needs_human` and leave the thread open**; do not edit code for it and do
not resolve it. (A human's unresolved thread keeps the PR off the auto-merge path, which is the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not classify approvals as needing human action

With required code-owner review, the successful path to auto-merge includes a non-author human APPROVED review. This rule turns any human review/comment into needs_human, and the loop's merge gate later requires needs_human == false, so the first approval that satisfies the ruleset also forces the PR down the human path forever. Treat pure approvals/praise/no-request reviews as non-blocking and reserve needs_human for human comments or change requests that need action.

Useful? React with 👍 / 👎.

1. **Verify it is NOT a false positive** against the *current* code. If it is, set its Status to
`False positive` (`e6ff360e`) and return with `false_positive: true` (do nothing else).
2. **Claim it** immediately (Status → `In Progress` `47fc9ee4`).
3. **Branch off `origin/main`** first (before editing), named `$BRANCH_PREFIX/<descriptive-slug>`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fetch main before branching each loop item

In a multi-item/headless run, the local origin/main ref is not updated after another PR (including one the loop just merged) lands, and this step branches every new item directly from that ref. Without an explicit git fetch origin main before the recent-merge scan/branch, later PRs can be based on stale main and miss or reintroduce the just-merged Project #9 changes, leading to conflicts/red CI.

Useful? React with 👍 / 👎.

Comment on lines +88 to +90
- `touched_public_api` — changed a public type/member in `Beutl.Engine`, `Beutl.Extensibility`,
`Beutl.NodeGraph`, `Beutl.FFmpegIpc`, or `Beutl.ProjectSystem`.
- `touched_gpl_mit` — touched the GPL/MIT boundary.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include all design-review public surfaces

design_reviewer_required is derived from this narrower project list, but the committed beutl-design-reviewer scope also covers public changes in src/Beutl.Api/, src/Beutl.Controls/, new abstractions anywhere, and compat shims (.claude/agents/beutl-design-reviewer.md:18-20). A public API change in those omitted surfaces can therefore skip the draft design pass and remain eligible for the low/moderate auto-merge path without the required design review; mirror the reviewer scope here.

Useful? React with 👍 / 👎.

Comment on lines +127 to +128
- Commit addressed changes (`fix(review): …`) and `git push` the existing PR branch. Never
force-push; never push `main`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check out the PR head before pushing review fixes

When /beutl-loop invokes this resolver after Dispatch A, the current checkout is still the orchestrator branch, not the PR branch created in the worker's isolated worktree. This git push assumes review edits were committed on the existing PR branch, but Step 1 only reads headRefName and never checks it out, so any auto-addressed bot comment can be committed/pushed to the wrong branch or fail to update the PR; fetch/check out the PR head before Step 5 edits.

Useful? React with 👍 / 👎.

Comment on lines +136 to +137
gh api "repos/$OWNER_REPO/pulls/<PR>/comments/<comment_id>/replies" -f body="Done — <one-line fix>."
gh api graphql -f query='mutation{resolveReviewThread(input:{threadId:"<THREAD_ID>"}){thread{isResolved}}}'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle issue comments outside review-thread replies

Step 2 also fetches general PR comments via the Issues comments API, but this reply/resolve path is only valid for pull request review comments (GitHub documents regular PR comments separately: https://docs.github.com/rest/issues/comments). If an actionable bot posts a top-level PR comment, --auto can try to reply to /pulls/.../comments/<id>/replies and resolve a nonexistent review thread, so branch issue comments to an issue-comment response path and do not treat them as resolvable review threads.

Useful? React with 👍 / 👎.

Comment on lines +115 to +117
- `false_positive: true` → the runner already marked the board item `False positive`, so the item
**leaves the queue** — this is **progress**: reset `consecutive_no_progress` to 0, increment
`consecutive_false_positives`. Continue.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear stale failure signatures after progress

When a systemic block sets last_failure_signature and the next item is a false positive, this progress path resets consecutive_no_progress but leaves that signature intact. A later unrelated item that fails with the same first-error line can then trip the Step 0 "back-to-back" repeat guard even though a progress tick occurred in between; clear the stored failure signature on progress/neutral outcomes or compare only against the immediately previous tick.

Useful? React with 👍 / 👎.

Comment on lines +92 to +94
`BEUTL_LOOP_MAX_ITEMS`, default 50) · wall-clock exceeded · `consecutive_no_progress ≥ 3` **and** no
PR was opened in the last 3 ticks (`items_processed − last_pr_tick > 3` — recent shipping counts as
progress and holds the breaker open) · `consecutive_false_positives ≥ 3` (the board/selection is

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop after the documented third no-progress tick

With last_pr_tick initialized to 0, the condition items_processed − last_pr_tick > 3 is still false after exactly three no-progress ticks (3 > 3), so the loop runs a fourth item before the advertised three-strike breaker fires; the same off-by-one occurs after any later PR. Use >= 3 or update the documented stop condition so a run does not continue past the intended stagnation budget.

Useful? React with 👍 / 👎.

The runner handed back a pushed **draft branch** instead of a PR because the change is design-sensitive
(`design_reviewer_required`). Run up to **two** rework iterations:

1. Dispatch **`@beutl-design-reviewer`** (read-only) on `git diff origin/main...<draft_branch>`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fetch draft branches before design review

The design-sensitive runner pushes draft_branch from its isolated worktree, but the orchestrator's checkout does not automatically have a local ref with that name. In that case git diff origin/main...<draft_branch> fails before @beutl-design-reviewer can run, blocking every design-sensitive item; fetch the pushed branch and diff against origin/<draft_branch> (or otherwise create the local ref) before dispatching the reviewer/rework pass.

Useful? React with 👍 / 👎.

Comment on lines +48 to +52
5. **Test with the baseline-first loop** and **gate on the exit code, never a console string**
(the locale trap is real — `Passed!`/`Failed!` vs `成功!`/`失敗!`). **A production change must ship
an NUnit test** (AGENTS.md rule #3): if your diff touches `src/` but adds no file under `tests/`,
go back and add one. Only when the change genuinely cannot carry a unit test (typically UI) do you
set `test_status: "manual-verification"` — and then a concrete `manual_verification_note` is

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run a build gate before opening PRs

This normal Dispatch A path gates on tests and formatting but never runs dotnet build, even though the loop contract says build is a binary gate before PR creation. For changes not covered by the selected test project (for example Windows-only targets or non-tested projects), the runner can open a PR that does not compile and only discover it in CI; add a build step to the pre-commit gate before push/PR.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants