feat(extensions): W7 — unify extension failure surfaces (swamp-club#342)#1383
Conversation
…istries.failures[] into sourceDetails[] (swamp-club#342) Removes the legacy `registries.<kind>.failures[]` dual failure surface and makes `aggregateState.sourceDetails[]` the single authoritative source of extension failure state in `swamp doctor extensions`. Core changes: - Extract shared `recordSourceFailure` helper for failure recording - Route both warm-start and cold-start `buildIndex` failures through the Extension aggregate via `repository.saveAll()` - Remove `markCatalogValidationFailed` — validation failures now throw `ValidationError` caught by the shared helper - Remove `DoctorRegistryFailure`, `partitionForRegistry`, and the `getWarnings`/`resetState` deps from doctor - Derive per-registry status from `sourceDetails[]` failure-state tags - Migrate log-mode renderer to read failure details from `sourceDetails[]` - Split `extension_load_warnings.ts` — keep type-extraction stderr warnings, remove failure-recording pipeline Catalog fixes: - Fix pre-existing SQL bug where `upsert` ON CONFLICT UPDATE overwrote `state` and `last_error` when callers omitted them — now conditionally excluded from the UPDATE clause, preserving existing values - Add `last_error` TEXT column to catalog schema with idempotent migration — persists error messages across process invocations - Preserve `ValidationError` type through cold-start `load()` path via `originalError` field on `ExtensionLoadResult.failed` entries JSON schema changes (doctor extensions --json): 1. Removed: `registries.<kind>.failures[]` field 2. Added: `sourceDetails[].lastError` for failure states (backward compatible addition) Note: pre-migration catalog rows in failure states get empty `lastError` until the next reconcile pass touches them. Self-heals on first run. Known behavior changes (follow-ups filed): - Type-extraction warnings (non-literal type fields) no longer flip doctor exit code — #351 - `kind-completed` streaming events carry preliminary status during scan phase, final status computed from aggregate state — #352 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…alization The cold-start buildIndex handler was reverted due to Windows test failures where `loadAll()` dropped catalog rows as orphans. The root cause: `populateCatalogFromDir` wrote `source_path` with native backslashes on Windows, but `deriveExtensionIdentity` splits on forward slashes — causing false orphan detection. Fix: canonicalize `source_path` via `canonicalizePath()` in `populateCatalogFromDir` before writing to the catalog. This matches the canonicalization that `makeSourceLocation` applies in the reconcile and warm-start paths. Restores the cold-start handler that routes `load()` failures through `recordSourceFailure` → `repository.saveAll()`, closing the ADV-2 warm-start/cold-start asymmetry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-trip Two regression tests ensure the write-site canonicalization in populateCatalogFromDir survives future edits: - deriveExtensionIdentity: Windows backslash paths return null (callers must canonicalize) — pins the contract that raw Windows paths are rejected, forcing write sites to canonicalize before storing. - ExtensionCatalogStore: canonicalized source_path resolves via deriveExtensionIdentity — verifies that a Windows-style path canonicalized via canonicalizePathFor produces a catalog row that deriveExtensionIdentity successfully maps to @local/<repo>. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All path-taking lookup/delete methods on ExtensionCatalogStore now canonicalize their input via canonicalizePath() before querying: findBySourcePath, removeBySourcePath, removeBySourcePrefix, updateExtensionIdentity. This closes the write-read mismatch where populateCatalogFromDir stored canonical (forward-slash) paths but findStaleFiles looked up rows using native (backslash on Windows) paths — causing lookup misses, false "new file" detection, and missing fromCache warnings. Regression tests pin the contract: a row written with a canonical path is findable via a differently-cased Windows-style lookup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
removeBySourcePath, removeBySourcePrefix, and updateExtensionIdentity receive paths from DB rows (already canonical) or from internal code that has already canonicalized. Re-canonicalizing lowercases on Windows, breaking WHERE lookups against mixed-case stored paths. Only findBySourcePath receives raw filesystem paths from callers like findStaleFiles — that's the one chokepoint that needs canonicalization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
Well-structured PR that successfully consolidates extension failure surfaces into a single authoritative source (sourceDetails[]). The shared recordSourceFailure helper eliminates the consistency gap between reconcile and buildIndex, and the SQL state-preservation fix is a clean solution.
Blocking Issues
- Missing unit tests for
source_failure_recorder.ts— This is a new 208-line domain file with non-trivial branching logic (ValidationError vs. other error types, existing vs. new sources, state-transition detection,kindDirToExtensionKind/extensionKindToKindDirmapping). CLAUDE.md requires "Comprehensive unit test coverage" with "Unit tests live next to source files:foo.ts→foo_test.ts". Nosource_failure_recorder_test.tsexists. TherecordSourceFailurefunction alone has 4+ distinct code paths that should be directly tested.
Suggestions
-
Dead
_quietparameter (src/cli/mod.ts) — Thequietparameter inloadUserModels,loadUserVaults,loadUserDrivers,loadUserDatastores, andloadUserReportsis now unused (renamed to_quiet). Callers at lines 316-359 still pass the value. Could clean up the parameter from both the function signatures and their call sites in a follow-up. -
kind-completedalways emitsstatus: "pass"(src/libswamp/extensions/doctor.ts) — This is noted in the PR body as a known behavior change, but worth calling out: thekind-completedevents are now semantically misleading until thecompletedevent fires. Since the renderer is the sole consumer and only uses the finalcompletedreport, this is acceptable — but if other consumers are added, the preliminary status could cause confusion. Consider naming itpreliminaryor omitting the field fromkind-completedevents entirely. -
DDD layer violation tracked —
source_failure_recorder.ts(domain) importsExtensionKindfromextension_catalog_store.ts(infrastructure). The ratchet bump from 24→25 is correct. A follow-up to liftExtensionKindinto the domain layer would bring this back down.
There was a problem hiding this comment.
CLI UX Review
Blocking
1. Per-registry scan icons always show ✓ even for failing registries
File: src/presentation/renderers/doctor_extensions.ts:233–235 and src/libswamp/extensions/doctor.ts:325–328
The kind-completed event is now always yielded with status: "pass" (hardcoded), because aggregateState isn't built until after all registries finish loading. The log renderer uses iconFor(r.status) from this event to render the per-registry row, so every registry prints ✓ registry-name regardless of whether it will ultimately fail.
The user sees this sequence:
✓ model ← misleading: model will fail
✓ vault
✓ driver
✓ datastore
✓ report
• /path/bad.ts ValidationFailed: missing version
1 passed, 4 failed — OVERALL: FAIL ← contradicts all the ✓ above
The summary count (1 passed, 4 failed) correctly uses the final report.registries statuses, but the per-registry rows — the most prominent output during the scan phase — all show green checkmarks. A user reading the scan output sees nothing wrong, then gets a FAIL summary with no clear mapping of which registries failed.
The PR description acknowledges this is an intentional architectural decision (follow-up #352), but the renderer hasn't adapted to it. A simple fix: don't render an icon at kind-completed time at all (just • registry-name as an in-progress indicator), and re-render each row with its final ✓/✗ icon inside the completed handler using e.report.registries[name].status.
2. Loader errors silently dropped from log-mode output
File: src/libswamp/extensions/doctor.ts:318–323 and src/presentation/renderers/doctor_extensions.ts:237–259
When ensureLoaded() throws, the error message is stored in loaderErrors but never put into the DoctorExtensionsReport. The completed log renderer only shows failure bullets from aggregateState.sourceDetails (catalog-driven). Since a loader crash may not have touched the catalog at all, sourceDetails will have zero failure entries for that registry.
Result: registry shows ✓ model during scan, shows no failure details at completion, and the user only learns the registry failed from the N failed summary count — with no error message explaining why.
The old code created a synthetic <model loader>: error message bullet for exactly this scenario. The new code doesn't surface the loader error in log output at all.
Fix: either include loaderErrors in the DoctorExtensionsReport so the renderer can surface them, or render them before the completed yield.
Suggestions
-
The failure bullets in the
completedhandler appear without a header (just a blank line then bullets). Previously they appeared under the failing registry row. Consider adding a small label likeFailures:or using the same⚠warning style used for orphan files so users can orient themselves in the output. -
In the
completedhandler, the failure bullets showsourcePath stateTag: lastErrorbut don't indicate which registry the source belongs to. When a user has models and vaults both failing, it may not be obvious which registry owns a given source path. Thekindfield is already present onDoctorSourceDetail— including it in the bullet (model /path/bad.ts ValidationFailed: …) would help.
Verdict
NEEDS CHANGES — the per-registry scan icons always show ✓ even for failing registries, creating a direct visual contradiction with the summary line; and loader-level errors are silently dropped from log-mode output, a regression from the previous behavior.
There was a problem hiding this comment.
Adversarial Review
Critical
-
dropValidationFailedColumn()recreate-table omitslast_errorcolumn — guaranteed crash on upgrade from pre-W1b databasesFile:
src/infrastructure/persistence/extension_catalog_store.ts:405-431The PR adds
last_error TEXT NOT NULL DEFAULT ''tocreateSchema()(line 210),addNewColumnsIfMissing()(lines 304-308),upsert(),upsertWithIdentity(), andmapRow(). However,dropValidationFailedColumn()(lines 405-431) was not updated — itsCREATE TABLE bundle_types_newandINSERT INTO...SELECT FROMstatements do not includelast_error.Breaking scenario: A user with a pre-W1b database (has
validation_failedcolumn, no drop-migration marker) upgrades to a version containing this PR:addNewColumnsIfMissing()addslast_errorvia ALTER TABLE.dropValidationFailedColumn()recreates the table withoutlast_error.- The
last_errorcolumn is silently dropped from the table. - Any subsequent
upsert()orupsertWithIdentity()call executesINSERT INTO bundle_types (..., last_error)— SQLite throws "table bundle_types has no column named last_error". - The process crashes.
Self-heals on restart (
addNewColumnsIfMissing()re-adds the column, drop-migration marker is already set so it skips), but the first run after upgrade is a hard crash.Fix: Add
last_error TEXT NOT NULL DEFAULT ''to thebundle_types_newCREATE TABLE indropValidationFailedColumn(), and addlast_errorto both the INSERT INTO and SELECT column lists. Add a test that exercises the happy path ofdropValidationFailedColumnon a table that already has alast_errorcolumn and verifies the column survives.
Medium
-
Cold-start
buildIndexfailure routing resolves paths against wrong base directoryFile:
src/domain/extensions/extension_loader.ts:396In the cold-start failure routing (lines 383-438),
resolve(dir, failure.file)usesdir(the primary directory) to reconstruct the absolute path. Butfailure.fileis relative to thebaseDirit was discovered in — which could be one ofadditionalDirsrather thandir.Breaking scenario: When
additionalDirsis passed tobuildIndex, and a file from an additional directory fails validation, the reconstructedabsolutePathpoints to a non-existent location underdir.makeSourceLocationcomputes a wrong canonical path,coldIndex.get()misses, and the failure silently goes unrecorded in the aggregate state.Fix: Either propagate
baseDirin the failure record fromload(), or skip aggregate-state recording when the source can't be located in the index (as a defensive fallback). -
kind-completedevents always emitstatus: "pass"even when the loader threwFile:
src/libswamp/extensions/doctor.ts:325-328The yield at line 326 unconditionally emits
status: "pass". WhenensureLoaded()throws (caught at line 318), the error is stashed inloaderErrorsbut the event still says pass. Any streaming consumer that renders per-registry status fromkind-completedevents will show green checkmarks for failing registries.The PR description acknowledges this as a known behavior change (#352). The log renderer happens to work because it re-computes final status in the
completedhandler. But the JSON streaming protocol's per-event contract is violated — akind-completedwithstatus: "pass"is an affirmative claim that the registry passed, not a "pending" signal.
Low
-
Repeated same-state failures suppress transition events
File:
src/domain/extensions/source_failure_recorder.ts:123When a source already in
ValidationFailedfails again (e.g., with a different error message),fromState === toStateevaluates true and no transition is emitted. The updatedlastErroris persisted but the transition is invisible toswamp doctor extensionsoutput. This means a changed error message (e.g., a different validation error after editing) produces no visible diagnostic output. -
Cold-start
makeLocalExtensionuses adapter kind as basenameFile:
src/domain/extensions/extension_loader.ts:422-425When a cold-start failure can't be matched to an existing extension, a new extension is created with
basename: this.adapter.kind(e.g.,"model"), producing@local/modelinstead of@local/<repo-name>. This is a minor identity inconsistency that the reconcile pass would correct, but the transient state could confuse diagnostic output.
Verdict
FAIL — Finding #1 is a guaranteed crash on the upgrade path from pre-W1b databases. The dropValidationFailedColumn() recreate-table must include the last_error column.
1. Add source_failure_recorder_test.ts — 13 unit tests covering all recordSourceFailure branches (ValidationError vs generic, existing vs new source, transition emission, kind mappings). 2. Fix per-registry scan icons — kind-completed renders plain bullet during scan; completed handler re-renders each registry with its correct status icon from the final report. 3. Surface loader errors — loaderErrors map added to DoctorExtensionsReport; rendered as bullets in log mode and as a plain object in JSON mode. 4. Fix dropValidationFailedColumn crash — last_error column added to the recreate-table CREATE TABLE, INSERT, and SELECT column lists. Without this, upgrading from a pre-W1b database would crash on the first upsert after the table recreation. 5. Fix cold-start baseDir resolution — propagate baseDir on failure records so the cold-start handler resolves paths against the correct directory, not always the primary dir. 6. Make loadAll() read-only — destructive orphan-row deletion moved to explicit loadAllWithPruning(). resolveIdentity takes a pruneOrphans flag; loadAll passes false, loadAllWithPruning passes true. Reconcile's own diff logic handles orphan detection independently. Prevents source-mounted extension rows from being incorrectly dropped during read-only aggregate loading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
CLI UX Review
Blocking
None.
Suggestions
-
--verboseper-source detail omitslastError(src/presentation/renderers/doctor_extensions.ts:133-146)
InrenderAggregateStateLog, the "Per-Source Detail" rows rendersourcePath + stateTag + fingerprint + bundlePathbut skiplastError. The failure bullets in thecompletedhandler above DO surface it, so users aren't totally blind — but someone reaching for--verboseto dig into a broken source won't see the error message inline next to the path. Consider appendinglastErrorto the per-source rows the same way the failure bullets do:: ${detail.lastError}. -
loaderErrorsis absent-when-empty; its peers are always-present (src/presentation/renderers/doctor_extensions.ts:388-390)
orphanFilesandrecentTransitionsare always in the JSON output (empty array when nothing to report).loaderErrorsis conditionally omitted when empty. Scripts have to guard withif (output.loaderErrors)rather than iterating directly, which is a subtle inconsistency. Emitting{}unconditionally (or{}only whenaggregateStateis present) would match the shape of the other always-present fields. -
Breaking JSON schema change communicated only in the PR body (
registries.<kind>.failures[]removed)
Documented in the PR summary, but nothing in-band warns existing scripts. Scripts that key onresult.failures.length > 0(rather thanresult.status === "fail") will silently stop detecting failures —undefined.lengthin JS throws;null | lengthin jq returns 0.result.statusstill works, so the impact is narrow, but a note in the release changelog or a prominent migration note would help operators auditing their CI pipelines.
Verdict
PASS — the core failure-surface unification is solid, error messages are actionable, both output modes are consistent, and exit codes are correct. The three items above are quality-of-life tweaks that don't block merge.
There was a problem hiding this comment.
Code Review
Well-structured refactoring that unifies extension failure surfaces through a single sourceDetails[]-based model, eliminating the dual registries.<kind>.failures[] reporting path. The extraction of source_failure_recorder.ts as a shared domain service is clean DDD — both reconcile and buildIndex now route failures through the same pure function.
Blocking Issues
None found.
Suggestions
-
_quietparameter cleanup (src/cli/mod.ts): FiveloadUser*functions now prefix thequietparameter with_sincerecordLoadFailureswas removed. This is correct for lint suppression, but if the parameter is truly dead (no consumer inside the function body), consider removing it from both the function signatures and call sites in a follow-up to reduce confusion. -
kind-completedalways emits"pass"(src/libswamp/extensions/doctor.ts:332): Thekind-completedevent now always carriesstatus: "pass"during the scan phase — final status is only determined in thecompletedevent. The PR description documents this as a known behavior change and the log renderer handles it correctly (just shows registry name during scan, re-renders with correct icons at completion). Worth noting that any externalkind-completedconsumer would see misleading status — the JSDoc on the event type could mention this caveat. -
loadAllWithPruning()is only called in tests (src/infrastructure/persistence/extension_repository.ts:171): The newloadAllWithPruning()method distinguishes read-only vs. side-effecting loads, which is a good semantic improvement. Currently only tests exercise it directly. If the doctor aggregate builder or other paths should eventually use it, that's fine as a follow-up. -
Duplicate failure-routing logic in
extension_loader.ts: The warm-start path (lines ~336-369) and cold-start path (lines ~391-444) both buildsourcePathIndex/coldIndexmaps and callrecordSourceFailurewith nearly identical setup (fingerprint computation, stat, error wrapping). If these patterns diverge it's fine, but if they stay identical, extracting a small helper could reduce the surface area for inconsistencies.
There was a problem hiding this comment.
Adversarial Review
Critical / High
-
HIGH — Cold-start multi-failure data loss silently overwrites earlier failure states (
src/domain/extensions/extension_loader.ts:391-444)In the cold-start branch of
buildIndex,coldExtensionsis loaded once (line 392) and never updated within the failure-recording loop. When two failures target the same extension, the second iteration reads the original extension fromcoldExtensions[extIdx](line 406), builds a new modified extension from that stale base, and callsrepository.saveAll()— which overwrites the first failure's persisted state viaapplyDiffForExtension.Concrete breaking scenario: A repo with
extensions/models/a.tsandextensions/models/b.ts, both belonging to@local/myrepo. On first run (cold-start), both fail validation.- Iteration 1:
coldExtensions[0]has sources A(Indexed) + B(Indexed).recordSourceFailureproduces A(ValidationFailed) + B(Indexed).saveAllwrites this. - Iteration 2:
coldExtensions[0]is still A(Indexed) + B(Indexed) — the in-memory array was never updated.recordSourceFailureproduces A(Indexed) + B(BundleBuildFailed).saveAllwrites this, overwriting A's ValidationFailed back to Indexed.
The warm-start path (line 365) correctly does
extensions[extIdx] = failResult.extension;— this line is missing from the cold-start path.Similarly, when
extisundefined(no existing catalog extension), the fallback creates a throwawaymakeLocalExtension({ basename: this.adapter.kind })on each iteration. Each creates@local/model(or@local/vault, etc.) with only one failure source. The secondsaveAllcall'sapplyDiffForExtensionsees the first failure's row under@local/model, finds it's not in the new extension's sources, and deletes it.Suggested fix: Add
coldExtensions[extIdx] = failResult.extension;after line 443, and for theext === undefinedcase, accumulate failures per-extension before a singlesaveAll. - Iteration 1:
Medium
-
MEDIUM —
kind-completedevents always emitstatus: "pass"during scan (src/libswamp/extensions/doctor.ts:330-333)Every
kind-completedevent unconditionally reportsstatus: "pass". The real pass/fail is only computed at thecompletedevent (line 360-366). The PR documents this as a known behavior change (#352) and notes the renderer is the sole consumer — the renderer correctly ignores the preliminary status and re-renders frome.report.registries. However, any third-party streaming consumer watchingkind-completedevents (e.g., a CI integration or custom TUI) would receive wrong intermediate results. -
MEDIUM —
loadAllWithPruning()is dead code in production (src/infrastructure/persistence/extension_repository.ts:171)loadAllWithPruning()is defined, tested, and the test was modified in this PR — but it has zero production callers. All productionloadAll()calls passpruneOrphans: false, meaning orphan rows (pulled extensions removed from the lockfile) are never cleaned up except via the reconcile service's explicit tombstoning. IfloadAllWithPruningwas introduced to close a pruning gap, it needs a production call site. If it's infrastructure for a future feature, that's fine but the test change in this PR may give a false impression of coverage.
Low
-
LOW — Cold-start fallback extension name mismatch (
src/domain/extensions/extension_loader.ts:429-432)When
extis undefined in the cold-start failure path,makeLocalExtension({ basename: this.adapter.kind })creates an extension named@local/model(or@local/vault, etc.). The reconcile service creates the "real" local extension as@local/<repo-directory-name>. These never collide (different names), but the cold-start-created@local/modelrows become permanent orphans since no subsequent code path adopts or tombstones them. The next reconcile pass creates the proper@local/<repo>extension independently.
Verdict
FAIL — The cold-start multi-failure data loss (finding #1) silently drops failure states when multiple sources in the same extension fail during the first run. The warm-start path has the fix pattern already (line 365) — it just needs to be applied to the cold-start path as well.
…enderer improvements
HIGH: Fix cold-start multi-failure data loss where second failure on
the same extension overwrote the first. coldExtensions[extIdx] is now
updated after each recordSourceFailure call. For the fallback case
(no existing extension), a single localFallback accumulates all
failures. One saveAll at the end instead of per-failure.
Additional fixes from review:
- kind-completed events now emit status: "fail" when ensureLoaded threw
- Failure bullets include detail.kind for registry context
- Verbose per-source detail includes lastError inline
- loaderErrors always present in JSON output (empty {} when no errors)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
CLI UX Review
Blocking
None.
Suggestions
-
Verbose mode: failed sources appear twice. In
--verbosemode, a failed source is shown in the failure-bullet block (lines 262–281 ofsrc/presentation/renderers/doctor_extensions.ts, rendered immediately after the registry status list) and again in the "Per-Source Detail" section insiderenderAggregateStateLog. Both entries includelastError, so the information is redundant. One option: skip failure-state entries from Per-Source Detail in verbose mode; another is to drop the pre-aggregate failure bullets when verbose is on (the Per-Source Detail is more complete anyway). Minor, but worth a follow-up. -
Two-pass registry listing in log mode. The new rendering emits each registry name twice: first as
• model(dim, during scan) and then as✓/✗ model(bold, at completion). The code comment explains why (loader errors are only available in thecompletedevent), so the approach is architecturally sound. For slow scans the dim-bullet phase provides useful progress feedback; for fast scans both sets of lines appear in rapid succession and the double-listing can look like noise. Consider suppressing the scan-phase bullets when stdout is not a TTY (piped/CI output), mirroring the pattern other swamp commands use for progress indicators. -
Breaking JSON schema change — no in-band notice.
registries.<kind>.failures[]is removed from the--jsonoutput; scripts usingparsed.registries.model.failureswill silently getundefined. The PR description documents the migration path (aggregateState.sourceDetails[]filtered by failure state tags, plus the newloaderErrorsobject), but CLI consumers won't see that context. A one-time deprecation cycle (emit the field as[]for a release, add a_deprecatednote) or a mention in the command's--helpexamples would reduce surprise for existing automation. Not blocking given the explicit schema-change notice in the PR, but worth filing a follow-up.
Verdict
PASS — no blocking issues. The unified failure surface is a genuine UX improvement: lastError in failure bullets gives users actionable context without having to dig into logs, and the single sourceDetails[] source of truth is easier to script against than the dual-surface the prior design required.
There was a problem hiding this comment.
Code Review
Clean, well-structured PR that successfully unifies the extension failure surface from dual registries.failures[] + sourceDetails[] into a single source of truth. The source_failure_recorder.ts extraction is well-designed, tests are thorough (374-line unit test file, integration lifecycle test for lastError), and the SQL migration is backward-compatible.
Blocking Issues
None.
Suggestions
-
Stale docstring in
extension_catalog_store.ts:660-666— theupsertJSDoc still referencesmarkCatalogValidationFailed(removed in this PR) and describesstateas always being in the SET clause. The behavior is now conditional (stateis only in the SET list whenrow.state !== undefined), which is correctly documented in theExtensionTypeRow.stateJSDoc (lines 99-108) but not in the upsert method's comment. -
loadAllWithPruning()is defined and tested but never called in production —loadAll()was changed from pruning to read-only, andloadAllWithPruning()was added as the explicit cleanup path. However, no production code calls it yet, so orphan rows that were previously cleaned up on everyloadAll()will now accumulate. Consider wiring it into the reconcile path or filing a follow-up. -
Dead
_quietparameters insrc/cli/mod.ts— fiveloadUser*functions now have_quiet = falsethat's never read after removingrecordLoadFailures. The_prefix satisfies the linter, but the parameter itself is dead code. Minor cleanup for a follow-up.
Overall: well-executed. The domain model is clean, the aggregate-state-driven registry status derivation is more principled than the old warning-partition approach, and the last_error persistence across process restarts is a solid UX improvement.
There was a problem hiding this comment.
Adversarial Review
Critical / High
No critical or high-severity findings.
Medium
-
Warm-start
saveAllinside catch block is not try-caught —src/domain/extensions/extension_loader.ts:366In the warm-start path,
repository.saveAll([failResult.extension])is called inside thecatchblock of the stale-files loop (line 366). This call is not wrapped in its own try-catch. If it throws (SQLite lock timeout exhaustingbusy_timeout,DuplicateTypeErrorfrom a concurrent process modifying the same catalog, or disk-full), the error propagates out of the for-loop's catch block, out ofbuildIndex, and to the caller. The function never returns itsresultobject, even though types from successful earlier iterations are already registered in-memory viathis.adapter.register. The caller receives a confusing SQLite or constraint error instead of the expectedExtensionLoadResult.Breaking scenario: Two
swampprocesses running simultaneously. Process A's warm-start loop hits a failure, callssaveAll, and the SQLite busy_timeout expires because process B holds a long write lock.buildIndexthrows; the CLI command fails with a raw SQLite error.Compare: The cold-start handler (lines 452-459) correctly accumulates failures into
coldExtensions/localFallbackand does a singlesaveAllafter the loop. The warm-start should either (a) wrap thesaveAllin try-catch with a warning log, or (b) accumulate and save once at the end like the cold-start does.Suggested fix: Wrap lines 339-367 in an inner try-catch that logs a warning and continues:
try { // ... the existing failure-recording + saveAll logic } catch (persistError) { this.logger.warn`Failed to persist failure state for ${relativePath}: ${persistError}`; }
-
Per-failure
saveAllin warm-start is O(N) transactions —src/domain/extensions/extension_loader.ts:366Each failed stale file triggers its own
saveAll, which opens a SQLite transaction, computes the diff, runs I-Repo-1 over the entire catalog, and commits. With N failures, this is N full-catalog scans + N transactions. The cold-start path (lines 452-459) batches into a singlesaveAll. The warm-start should batch similarly.Breaking scenario: A repo with 50 stale files that all fail (e.g., after a Deno version bump that breaks bundling). Each one triggers a full-catalog I-Repo-1 scan. On a catalog with 1000+ rows, this multiplies startup latency significantly.
Low
-
_quietparameters are dead but still accepted —src/cli/mod.ts:454,538,601,663The
quietparameter is renamed to_quiet(unused) inloadUserModels,loadUserVaults,loadUserDrivers,loadUserDatastores. Callers still passquiet: true(verified in the CLI bootstrap), but the value has no effect sincerecordLoadFailureswas removed. Warnings now go through LogTape only. Not a regression — LogTape has its own verbosity controls — but the dead parameter is confusing for future maintainers. Consider removing the parameter from the signature and callers in a follow-up. -
kind-completedevent status may differ from final status —src/libswamp/extensions/doctor.ts:330-338During the scan phase,
kind-completedevents carry a status derived only fromloaderErrors, not fromregistryHasFailures(which requires the aggregate state built later). A registry whereensureLoaded()succeeds butsourceDetailscontains failure-state entries getsstatus: "pass"in the stream event butstatus: "fail"in the finalcompletedevent. The log renderer ignores the stream-event status (renders a plain bullet), so users don't see the inconsistency. But external stream consumers (e.g., a future TUI or CI integration reading events) would see a misleading interim status. This is documented as known behavior (#352). -
removeBySourcePath/removeBySourcePrefixdo not canonicalize input paths —src/infrastructure/persistence/extension_catalog_store.ts:724-742findBySourcePathcanonicalizes its input (line 715), butremoveBySourcePathandremoveBySourcePrefixdo not. The PR's fifth commit deliberately scoped canonicalization tofindBySourcePathonly, noting that remove callers always pass already-canonical paths. This is correct today —applyDiffForExtensionpassessource.id.canonicalPath, anddeleteBySourcePathsreceives canonical paths fromDoctorCatalogOrphan. But the asymmetric contract is fragile: a future caller passing a raw Windows path toremoveBySourcePathwould silently delete nothing. Pre-existing design issue, not introduced by this PR.
Verdict
PASS — The changes are well-structured: failure recording is extracted into a shared pure function with good test coverage, the SQL conditional-update fix correctly preserves failure states, and the cold-start multi-failure fix accumulates correctly. The warm-start per-failure saveAll without try-catch is a robustness gap worth addressing, but it's a non-common failure mode (SQLite lock contention) and doesn't block the merge. The lastError lifecycle (populate on failure, clear on recovery) is correct end-to-end. The registryHasFailures derivation from sourceDetails is a clean replacement for the removed failures[] field.
Summary
Removes the legacy
registries.<kind>.failures[]dual failure surface and makesaggregateState.sourceDetails[]the single authoritative source of extension failure state inswamp doctor extensions. Closes the W1-W6 rearchitecture story — W7 removes the last vestige of the old failure-reporting model.What changed
src/domain/extensions/source_failure_recorder.ts) — extracted from reconcile's catch block. Both reconcile and buildIndex call the same helper, eliminating the consistency gap where buildIndex bypassed the aggregate.recordSourceFailure→repository.saveAll(). Cold-start failures also routed, withoriginalErrorpreservingValidationErrortype for correct state-tag discrimination.markCatalogValidationFailedremoved — validation failures now throwValidationError, caught by the shared helper. No more direct catalog upserts that bypass the aggregate.registries.failures[]removed — per-registry status derived fromsourceDetails[]failure-state tags (ValidationFailed,BundleBuildFailed,EntryPointUnreadable).extension_load_warnings.tssplit — keptemitTypeExtractionFailurefor stderr regex-mismatch warnings; removedrecordLoadFailures,getExtensionLoadWarnings, and the warnings array accumulation.last_errorcatalog column — schema migration addslast_error TEXTtobundle_types. Error messages persist across process invocations. Clears on recovery to Indexed.upsertON CONFLICT UPDATE now conditionally excludesstateandlast_errorwhen callers don't provide them, preventingpopulateCatalogFromDirfrom overwriting reconcile's failure states withIndexed.JSON schema changes (doctor extensions --json)
registries.<kind>.failures[]— the field no longer exists onDoctorRegistryResultsourceDetails[].lastError?: string— populated for failure states (ValidationFailed,BundleBuildFailed,EntryPointUnreadable), absent for healthy statesMigration note
Pre-migration catalog rows in failure states get empty
lastErroruntil the next reconcile pass. Self-heals on first run after upgrade — not a bug, just a transient state.Known behavior changes (follow-ups filed)
typefields (e.g.const TYPE = "...") produce stderr warnings but no longer flip doctor exit code. They don't register and don't appear insourceDetails[]. Previously surfaced via the removedregistries.failures[]pipeline.kind-completedevents carry preliminarystatus: "pass"during the scan phase; final status is computed from aggregate state in thecompletedevent. Internal-only (renderer is the sole consumer).Test plan
deno check/deno lint/deno fmtclean./swamp:failuresfield,lastErrorpresent🤖 Generated with Claude Code