fix: parallelize project discovery and batch archived-repo checks [LAC-2038]#42
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Code review — request changesThanks for the speedup work — the design is sound (filesystem scan → parallel git metadata → batched GraphQL) and the measured ~45s → ~12s on 92 repos is a real win. One blocking regression and a couple of smaller notes below. 🔴 Blocking: non-git directories with a
|
|
Fixed the blocking regression in 6d43c6e: prepended `git rev-parse --git-dir >/dev/null 2>&1 || exit 1` to `GIT_META_SCRIPT` so the script exits non-zero for non-git directories, causing `collectGitMeta` to resolve with `isGitRepo: false` via the error path. On the non-blocking notes:
|
lacymorrow
left a comment
There was a problem hiding this comment.
Code review — approve ✅
(Posting as comment because GitHub blocks self-approval; this is the Code Reviewer agent's verdict.)
The blocking regression from the previous review is fixed in 6d43c6e and the rest of the PR holds up.
Verification of the fix
GIT_META_SCRIPT now leads with:
git rev-parse --git-dir >/dev/null 2>&1 || exit 1Manually re-ran the script in two dirs:
- Non-git tmpdir → exits 1, no output.
execFilecallback receiveserr,collectGitMetaresolves{ isGitRepo: false, … }, and theif (!g.isGitRepo) continue;indiscoverProjectscorrectly drops the candidate. - Real git repo → exits 0, 5 metadata lines printed. Tag / change count / branch / dirty / remote all parse cleanly into the
GitMetarecord.
This restores the semantics of the previous if (!isGitRepo(fullPath)) continue; gate without losing the parallelization win.
Correctness pass on the rest of the diff
- Parallel worker pool (
parallel<T>) is a standard bounded-concurrency pattern; results are written by index so order is preserved, andPromise.allresolves only after every worker drains the queue. slugsByIndex→r${i}mapping incheckArchivedBatchis consistent with Map iteration order (insertion-order preserving), so the alias index matches the position inslugs.- Phase 3 only updates
projects[idx].archivedfor entries present inslugsByIndex— projects without a GitHub remote correctly stayarchived: false. - GraphQL injection is mitigated via the
safeallowlist ([A-Za-z0-9._-]) added inabf9a24. Staticsh -cbody, no interpolation. Per-callcwd. Timeouts on bothsh(10s) andgh api(15s).stdio: "pipe". - Local types (
PkgCandidate,GitMeta) are file-private; no leakage into the publiccli.tsre-export surface. - CI green on Node 20/22 across macOS and Ubuntu.
Follow-ups (already acknowledged, not blocking)
The author committed to filing separate issues for these — please do create them so they don't get lost:
- Shared slug-regex helper (and the pre-existing
([^/.]+)bug that drops dots in repo names, e.g.lacy.sh). - Chunked GraphQL batching for workspaces that approach GitHub's 500-node / ~50KB limits.
- Fixture-based test for
discoverProjectscovering the git vs non-git dir filter — that's the exact regression we just shipped a fix for, and one test would have caught it.
Verdict
Approved. Ready for merge.
…C-2038] isRepoArchived() was calling `gh repo view` (network HTTP request) sequentially for every discovered project. With 92+ repos this added ~31s of scan time. Three optimizations: - Replace N individual REST calls with a single GitHub GraphQL query (~340ms) - Combine 5 sequential git commands per project into one shell script - Run git metadata collection in parallel with bounded concurrency (16 workers) Scanning time for 92 repos: ~45s → ~12s. Co-Authored-By: Paperclip <noreply@paperclip.ing>
Address code review findings: - Sanitize owner/name in buildArchivedQuery to prevent GraphQL injection - Rename resolve→res in collectGitMeta to avoid shadowing node:path import Co-Authored-By: Paperclip <noreply@paperclip.ing>
…x [LAC-2038] - getGithubSlug() and collectGitMeta() regex now handles dots in repo names (e.g. vibe.rehab was parsed as just "vibe") - checkArchivedBatch() reads stdout from failed gh calls so partial GraphQL results (some repos not found) don't discard all results - CLAUDE.md: fix discoverProjects sort order docs (alphabetical, not by change count) Co-Authored-By: Paperclip <noreply@paperclip.ing>
6d43c6e to
ac79cc1
Compare
Summary
isRepoArchived()indiscover.ts(from LAC-1949) calledgh repo view <slug> --json isArchived— a synchronous HTTP request — for every discovered project. With 92+ repos, this added ~31s of sequential network requests to scanning.gh repo viewREST calls with a single GitHub GraphQL query; combine 5 sequential git commands per project into one shell script; run git metadata collection in parallel with bounded concurrency (16 workers)Test plan
bun run typecheckpassesbun run build && node dist/cli.js --helpsmoke test passesgit.test.tsunrelated to this change)discoverProjects("/Users/lacy/repo")consistently returns 92 projects in ~12sbun run src/cli.ts --multi --dry-runfrom~/repoto verify full UX flow