docs(maintenance): design spec for workstation maintenance system#4
docs(maintenance): design spec for workstation maintenance system#4X-McKay wants to merge 1 commit into
Conversation
Add the design spec for a layered, AI-provider-agnostic workstation maintenance system: scheduled systemd-timer scans, on-demand slash commands, SessionStart nudges, and a journal under ~/.local/state/maintenance. Substrate (bash scanners + skills) lives in this repo so it survives any agent change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a design specification for a Workstation Maintenance System, a provider-agnostic framework for scanning and managing workstation health via bash scripts, systemd timers, and AI agent skills. The feedback focuses on technical refinements to the proposed implementation: improving the reliability of default branch detection in the dev-cruft scanner, correcting the JSON structure for Claude hook configurations, and suggesting dynamic path resolution in systemd unit files to avoid hardcoded repository locations.
| | `disk.sh` | `~/Downloads` size + week-over-week delta; files >100MB under `$HOME` (excluding `~/.cache`, `~/.local/share`, any `node_modules`, any `.git/objects`); `node_modules` / `target/` / `.venv` directories untouched 90+ days; `/tmp` files older than 30 days. | | ||
| | `docker-k8s.sh` | `docker images -f dangling=true`, `docker volume ls -f dangling=true`, total reclaimable space from `docker system df`. For each context in `~/.kube/config`, with `kubectl --request-timeout=5s`: evicted pods, terminating pods stuck >1h. Contexts that time out are reported as a single `info` item rather than failing the scan. | | ||
| | `system-health.sh` | `systemctl --user --failed` and `systemctl --failed`; `journalctl -p err --since "7 days ago"` count per unit; `apt list --upgradable` filtered to security updates; new listening ports vs. last week's snapshot (snapshot kept under `~/.local/state/maintenance/ports-prev.txt`). | | ||
| | `dev-cruft.sh` | `~/.claude/sessions/` JSONLs >30 days; `~/.claude/plans/` and `~/.claude/todos/` >60 days; isolation-mode worktrees untouched 30+ days that are not the cwd of any session in the last 30 days of `sessions.log`; for each tracked git repo under `~/git/`, local branches fully merged into the repo's default branch (detected via `git rev-parse --abbrev-ref origin/HEAD`); cwds appearing in `sessions.log` (last 30 days) with lingering untracked files per `git status --porcelain`. | |
There was a problem hiding this comment.
Using origin/HEAD to detect the default branch is unreliable as it is not always set in local clones (it requires git remote set-head origin -a). A more robust approach would be to use git symbolic-ref --short refs/remotes/origin/HEAD or a fallback mechanism to identify the primary branch (e.g., checking for main then master).
| "SessionStart": [{ "hooks": [{ "type": "command", "command": "bash ~/.claude/hooks/session-start-nudge.sh", "timeout": 2 }] }], | ||
| "SessionEnd": [{ "hooks": [{ "type": "command", "command": "bash ~/.claude/hooks/session-end-cwd-log.sh", "timeout": 2 }] }] |
There was a problem hiding this comment.
The JSON structure for the Claude hooks contains an unnecessary nested hooks key. Claude Code expects an array of hook objects directly under the event name (e.g., SessionStart).
| "SessionStart": [{ "hooks": [{ "type": "command", "command": "bash ~/.claude/hooks/session-start-nudge.sh", "timeout": 2 }] }], | |
| "SessionEnd": [{ "hooks": [{ "type": "command", "command": "bash ~/.claude/hooks/session-end-cwd-log.sh", "timeout": 2 }] }] | |
| "SessionStart": [{ "type": "command", "command": "bash ~/.claude/hooks/session-start-nudge.sh", "timeout": 2 }], | |
| "SessionEnd": [{ "type": "command", "command": "bash ~/.claude/hooks/session-end-cwd-log.sh", "timeout": 2 }] |
|
|
||
| [Service] | ||
| Type=oneshot | ||
| ExecStart=%h/git/setup/bootstrap/scripts/maintenance/scan.sh |
There was a problem hiding this comment.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e4da2f8cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| | Scanner | What it checks | | ||
| |---|---| | ||
| | `setup-doctor.sh` | Shells out to `setup doctor` and `setup drift --dry-run`. Captures their summary lines as `info` or `warn` items. | |
There was a problem hiding this comment.
When Phase 1 implements setup-doctor.sh from this spec, setup drift --dry-run will fail because the current DriftArgs only defines --json, --dotfiles, --profiles, --profile, and the diff/sync/adopt subcommands. Since the scanner contract treats nonzero exits as scanner failures, every scheduled scan would report the setup scanner as broken instead of surfacing config drift; use the read-only default setup drift or setup drift --json here.
Useful? React with 👍 / 👎.
Summary
Scope of this PR
Spec only. No code, scripts, hooks, systemd units, or docs/MAINTENANCE.md yet — those land in follow-up PRs once the spec is approved. The implementation plan will be drafted (via the writing-plans skill) only after the spec is reviewed here.
What's in the spec
.agents/skills/, per-provider shims under.claude/commands/setup-doctor,disk,docker-k8s,system-health,dev-cruft,config-driftRandomizedDelaySecandPersistent=true)maintenance-audit,maintenance-act) and three commands (/maintenance,/clean-disk,/clean-dev)~/.local/state/maintenance/(XDG-compliant, provider-neutral)setup doctorhang risk, SessionStart latency budget, deferred-list growth)Review focus
setupCLI doesn't?Test plan
~/.local/state) looks right🤖 Generated with Claude Code