Skip to content

Add incremental build support and watch mode to leadtype generate#119

Open
BurnedChris wants to merge 1 commit into
mainfrom
watch-mode-incremental-builds
Open

Add incremental build support and watch mode to leadtype generate#119
BurnedChris wants to merge 1 commit into
mainfrom
watch-mode-incremental-builds

Conversation

@BurnedChris

Copy link
Copy Markdown
Contributor
  • Introduced --watch option for leadtype generate, enabling automatic rebuilds on changes to documentation sources and configuration files.
  • Implemented incremental builds by caching file inputs and skipping unchanged files, improving performance during repeated runs.
  • Updated the library API to support a new cache option in convertAllMdx, allowing for dependency tracking and cache invalidation.
  • Enhanced documentation to reflect these new features, including detailed explanations of the caching mechanism and watch mode behavior.

These changes significantly enhance the efficiency and usability of the leadtype CLI for documentation generation.

- Introduced `--watch` option for `leadtype generate`, enabling automatic rebuilds on changes to documentation sources and configuration files.
- Implemented incremental builds by caching file inputs and skipping unchanged files, improving performance during repeated runs.
- Updated the library API to support a new `cache` option in `convertAllMdx`, allowing for dependency tracking and cache invalidation.
- Enhanced documentation to reflect these new features, including detailed explanations of the caching mechanism and watch mode behavior.

These changes significantly enhance the efficiency and usability of the `leadtype` CLI for documentation generation.
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • leadtype generate now supports watch mode for automatic rebuilds when content or configuration changes.
    • Repeated runs are faster thanks to incremental builds that skip unchanged work and clean up removed outputs.
    • --force is available to bypass cached results and rebuild everything.
  • Documentation

    • Updated CLI reference with the new watch option and expanded guidance on incremental build behavior and rebuild triggers.

Walkthrough

This PR adds --watch and --force flags to leadtype generate, introduces a new debounced filesystem watcher module, and implements incremental conversion caching in convertAllMdx with content-hash manifests, dependency tracking (including type-table TypeScript sources), stale output pruning, tests, and documentation.

Changes

Watch mode and incremental builds

Layer / File(s) Summary
File watcher module
packages/leadtype/src/cli/watch.ts, packages/leadtype/src/cli/watch.test.ts
New watchInputs API debounces and deduplicates filesystem change events, ignores noise paths (.git, node_modules, .DS_Store), and returns a WatchController with close(); tested for batching, ignoring, and non-existent paths.
CLI watch/force flags and generate routing
packages/leadtype/src/cli/generate.ts
GenerateArgs gains watch/force; parseGenerateArgs and usage text updated; runGenerateCommand routes to a new runGenerateWatch loop using watchInputs; executeGenerate returns {code, watchPaths} and resolves cache options via resolveConvertCache.
Incremental cache core module
packages/leadtype/src/convert/incremental.ts
New module with cache manifest types, SHA-256 content hashing with per-run memoization, atomic manifest load/save, and dependency-freshness validation.
convertAllMdx caching, dependency capture, and pruning
packages/leadtype/src/convert/convert.ts
MdxToMarkdownOptions/ConversionPrepareOptions gain cache/onDependency; per-file conversion checks cache validity, records dependency hashes, prunes stale outputs, saves the manifest, and logs cached/pruned counts.
Type-table dependency reporting
packages/leadtype/src/markdown/plugins/type-table.ts
TypeTableOptions gains onDependency; extractTypeFromFile reports resolved and dependency file paths via a new getFileDependencyReporter fallback wired through _compiler.addDependency.
Incremental cache test suite
packages/leadtype/src/convert/incremental.test.ts
Tests cover cache hits, edited-source rebuilds, include-dependency rebuilds, pruning after deletion, force rebuilds, fingerprint invalidation, missing output regeneration, and corrupt manifest recovery.
Documentation and changeset
docs/reference/cli.mdx, .changeset/watch-mode-incremental-builds.md
CLI reference documents -w/--watch, incremental build caching, and pruning; changeset records the minor release with the new CLI and convertAllMdx API behavior.

Estimated code review effort: 4 (Complex) | ~60 minutes

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as generate CLI
  participant Watcher as watchInputs
  participant Convert as convertAllMdx
  participant Cache as Cache Manifest

  User->>CLI: leadtype generate --watch
  CLI->>Convert: initial conversion run
  Convert->>Cache: load manifest, hash sources
  Cache-->>Convert: cached/stale entries
  Convert->>Convert: skip cached, rebuild changed, prune stale
  Convert->>Cache: save updated manifest
  CLI->>Watcher: arm watcher(watchPaths)
  Watcher-->>CLI: debounced change event
  CLI->>Convert: rerun conversion
  Convert->>Cache: revalidate & update manifest
Loading

Possibly related PRs

  • inthhq/leadtype#58: Both PRs modify the MDX conversion pipeline wiring in convertAllMdx, overlapping at the same conversion-option code paths.
  • inthhq/leadtype#95: Both PRs extend MdxToMarkdownOptions/conversion preparation around git enrichment used in cache fingerprinting.
  • inthhq/leadtype#111: Both PRs thread new options through ConversionPrepareOptions/prepareMdxConversion, overlapping at the same plumbing.

Poem

A rabbit watches, ears held high,
For every change in files nearby.
Hash by hash, the cache stays neat,
Stale outputs pruned, no repeat.
🐇 --watch, --force, incremental cheer,
Build fast and true, the burrow's clear!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: incremental builds and watch mode for leadtype generate.
Description check ✅ Passed The description matches the changeset and accurately describes watch mode, incremental builds, cache support, and docs updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch watch-mode-incremental-builds

Comment @coderabbitai help to get the list of available commands.

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — adds --watch/-w and default incremental builds to leadtype generate, with a content-hash manifest cache, dependency tracking for <include> targets and type-table TypeScript sources, and stale-output pruning.

  • CLI flags & docsparseGenerateArgs gains watch and force; help text and docs/reference/cli.mdx document incremental builds and watch mode; .changeset marks a minor bump.
  • Incremental cache (convert/incremental.ts) — per-file manifest under node_modules/.cache/leadtype/, recording sourceHash, per-dependency hashes, serialized git enrichment, and the output path. Every manifest miss (missing, corrupt, version or fingerprint mismatch) degrades to a full rebuild rather than erroring; the save path writes-then-renames to stay crash-safe.
  • convertAllMdx caching — skips a file only when its source hash, git enrichment, and all dependency hashes match and the prior output still exists; pruneStaleOutputs deletes outputs of deleted sources, scoped strictly inside outDir.
  • Dependency protocol — reuses the _compiler.addDependency file-data channel (already emitted by the include plugin) and extends TypeTableOptions with onDependency so type-table TypeScript reads invalidate the cache too.
  • Cache fingerprintresolveConvertCache keys the cache on leadtype version, docs-config content hash, and conversion-relevant flags; caching is skipped entirely when there is no node_modules/.
  • Watch loop (cli/watch.ts, runGenerateWatch) — debounced recursive fs.watch (fine given engines >= 22), ignores outDir/.git/node_modules/.DS_Store, serializes reruns, re-arms when a config edit changes the docs-source set, and stops cleanly on SIGINT/SIGTERM.

I verified the load-bearing seam: downstream site artifacts (search index, llms.txt, agents) read the converted markdown from outDir on disk rather than from convertAllMdx's return value, so a cached skip that leaves the output in place stays consistent — no stale-artifact bug. Dependency paths are recorded as real source paths (mapped back through gitSourcePath), so they survive the temp-mirror churn between runs. Test coverage is strong. The points below are informational only; nothing blocks merge.

ℹ️ --force is sticky across every watch rebuild

In watch mode each rebuild re-invokes generation with the original parsed args, so leadtype generate --watch --force reconverts every file on every change for the life of the session rather than forcing once and then rebuilding incrementally. That is defensible (force means force), but a user reaching for "clean start, then fast iteration" gets a full rebuild on each keystroke-triggered change instead — worth a deliberate decision.

Technical details
# `--force` is sticky across every watch rebuild

## Affected sites
- `packages/leadtype/src/cli/generate.ts``runGenerateWatch``rerun` calls `executeGenerate(args, io)` with the same `args.force` on every change; `resolveConvertCache` sets `force: true` each run, so `previousEntries` is always `{}`.

## Required outcome
- Decide whether `--force` should apply only to the initial watch build (then fall back to incremental) or intentionally to every rebuild. Current behavior is the latter.

## Suggested approach (optional)
- If one-shot force is desired: clear `args.force` (or pass a per-run flag) after the first `executeGenerate` in `runGenerateWatch`, leaving subsequent reruns incremental.

## Open questions for the human
- Is "force every rebuild in watch" intended, or should force be first-build-only? The CLI docs currently say "restart with `--force`", which reads as one-shot.

ℹ️ No incremental test for type-table source invalidation

The changeset calls out type-table TypeScript sources as a newly tracked dependency, and the plugin now reports them through onDependency. The incremental test suite covers source edits, include-target edits, pruning, force, fingerprint, and corrupt-manifest recovery — but not the type-table path, so a regression in getFileDependencyReporter or the extractTypeFromFile dependency reporting would pass CI silently.

Technical details
# No incremental test for type-table source invalidation

## Affected sites
- `packages/leadtype/src/convert/incremental.test.ts` — has "rebuilds a file when its include target changes" but no equivalent for a type-table `.ts` source change.
- `packages/leadtype/src/markdown/plugins/type-table.ts:236-254`, `798-826` — the new dependency-reporting code paths (fresh extraction and cached-extraction branch) are untested for cache invalidation.

## Required outcome
- Add an incremental test asserting that editing a `.ts` file consumed by a type table rebuilds the page that renders it (and leaves an unrelated page cached), mirroring the existing include-target test.

ℹ️ Nitpicks

  • packages/leadtype/src/convert/incremental.ts:125saveConvertCacheManifest uses a fixed temp path ${filePath}.tmp; two concurrent generate processes on one project (e.g. a --watch session plus a manual run) could race on it. Impact is bounded (a corrupt manifest just triggers a rebuild), but a pid- or mkdtemp-suffixed temp name would remove the race.

Pullfrog  | View workflow run | Using Claude Opus𝕏

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/leadtype/src/cli/generate.ts (1)

2629-2722: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Watch session can't detect a docs directory created after startup.

When executeGenerate fails because a configured docs directory doesn't exist yet, the returned watchPaths still includes that nonexistent path. watchInputs (see watch.ts) silently skips watching paths that don't exist at arm-time, so if the user subsequently creates the missing docs directory, no change event fires for it — only edits to the config file (if it resolved) would trigger a re-arm. The watch session is then stuck requiring a manual restart, with no in-app hint.

🩹 Suggested mitigation: fall back to watching the nearest existing ancestor
+function nearestExistingAncestor(target: string): string {
+  let dir = target;
+  while (!existsSync(dir)) {
+    const parent = path.dirname(dir);
+    if (parent === dir) {
+      return dir;
+    }
+    dir = parent;
+  }
+  return dir;
+}

Use nearestExistingAncestor(candidate) when building watchPaths for any docs directory that doesn't yet exist, so its parent's creation events are observed and can trigger a re-check.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/leadtype/src/cli/generate.ts` around lines 2629 - 2722, The watch
path selection in executeGenerate returns a nonexistent docs directory directly,
so watchInputs never arms a watcher for it and later creation is missed. Update
executeGenerate to detect missing docsDirs and replace each missing path in
watchPaths with nearestExistingAncestor(candidate) so the parent directory
creation can retrigger the generate flow. Keep the change localized around the
docs directory existence check and watchPaths assembly, using
nearestExistingAncestor and executeGenerate as the key symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/leadtype/src/cli/generate.ts`:
- Around line 2432-2483: The incremental cache is only enabled when
resolveConvertCache finds a node_modules directly under srcDir, which disables
caching for hoisted workspace layouts. Update resolveConvertCache in generate.ts
to locate the nearest usable node_modules by walking up from opts.srcDir
(Node-style resolution) instead of checking only the immediate child, while
keeping the existing cacheId/fingerprint logic and the ConvertCacheOptions file
path generation intact.
- Around line 2547-2607: The watch rerun path in runGenerateWatch keeps reusing
the same args object, so --force remains enabled for every rebuild and bypasses
the cache indefinitely. Update the watch flow so the force flag is only honored
for the initial executeGenerate call, then clear or override args.force before
subsequent reruns triggered by scheduleRerun/rerun. Make the change in
runGenerateWatch and keep the behavior aligned with resolveConvertCache so
incremental watch rebuilds can use the cache after the first pass.

In `@packages/leadtype/src/convert/convert.ts`:
- Around line 1357-1372: The dependency hashing in convert() is still done
sequentially inside the cacheState.nextEntries block, which unnecessarily
serializes I/O for dependencyPaths. Update the hashing logic near deps and
hashFileCached to mirror depsUnchanged by collecting all dependency hash
promises and awaiting them together with Promise.all, then build the deps record
from the resolved results while preserving the empty-string fallback for
unreadable or missing files.
- Around line 1270-1284: The cache manifest handling in convert.ts is dropping
prune history whenever config.cache.force is set or loadConvertCacheManifest
returns null on a fingerprint mismatch, which causes stale outputs to survive
forever. Update the cacheState setup so prune candidates are loaded
independently of force/fingerprint invalidation, and keep a separate history
source for pruneStaleOutputs instead of reusing previousEntries for both reuse
checks and pruning. Add a regression test in incremental.test.ts covering a
deleted source whose output is pruned across a force run or
fingerprint-invalidated run.

In `@packages/leadtype/src/convert/incremental.test.ts`:
- Around line 1-246: Add a test in incremental.test.ts that covers pruning after
a force run or fingerprint change using createCacheProject and runConvert. The
issue is that the current prune test only exercises the normal cache path, so it
misses the case where cacheState.previousEntries is cleared by force/fingerprint
invalidation and deleted outputs may not be removed. Reuse the existing helpers
(runConvert, tamper, existsSync, rm) to delete a source after an initial
conversion, rerun with force or a different fingerprint, and assert the deleted
file’s output is pruned while the remaining output stays intact.

In `@packages/leadtype/src/convert/incremental.ts`:
- Around line 79-116: The manifest validator in isManifestShape only checks that
entries exists as an object, so malformed ConvertCacheEntry data can slip
through and later crash pruneStaleOutputs when it reads entry.output. Tighten
isManifestShape in incremental.ts to validate every entry’s shape, including
required string/object fields such as sourceHash, deps, enrichment, and output,
and only return true when all entries match ConvertCacheEntry. Keep
loadConvertCacheManifest’s behavior unchanged so invalid manifests still fall
back to null and a full rebuild.

---

Outside diff comments:
In `@packages/leadtype/src/cli/generate.ts`:
- Around line 2629-2722: The watch path selection in executeGenerate returns a
nonexistent docs directory directly, so watchInputs never arms a watcher for it
and later creation is missed. Update executeGenerate to detect missing docsDirs
and replace each missing path in watchPaths with
nearestExistingAncestor(candidate) so the parent directory creation can
retrigger the generate flow. Keep the change localized around the docs directory
existence check and watchPaths assembly, using nearestExistingAncestor and
executeGenerate as the key symbols.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 487a745a-9c1b-47a1-b20d-6a660a72251e

📥 Commits

Reviewing files that changed from the base of the PR and between b79b8ca and 558f1f8.

📒 Files selected for processing (9)
  • .changeset/watch-mode-incremental-builds.md
  • docs/reference/cli.mdx
  • packages/leadtype/src/cli/generate.ts
  • packages/leadtype/src/cli/watch.test.ts
  • packages/leadtype/src/cli/watch.ts
  • packages/leadtype/src/convert/convert.ts
  • packages/leadtype/src/convert/incremental.test.ts
  • packages/leadtype/src/convert/incremental.ts
  • packages/leadtype/src/markdown/plugins/type-table.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Use explicit types for function parameters and return values when they enhance clarity
Prefer unknown over any when the type is genuinely unknown
Use const assertions (as const) for immutable values and literal types
Leverage TypeScript's type narrowing instead of type assertions

Files:

  • packages/leadtype/src/cli/watch.test.ts
  • packages/leadtype/src/cli/watch.ts
  • packages/leadtype/src/convert/incremental.test.ts
  • packages/leadtype/src/convert/incremental.ts
  • packages/leadtype/src/markdown/plugins/type-table.ts
  • packages/leadtype/src/convert/convert.ts
  • packages/leadtype/src/cli/generate.ts
**/*.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts,jsx,tsx}: Use meaningful variable names instead of magic numbers - extract constants with descriptive names
Use arrow functions for callbacks and short functions
Prefer for...of loops over .forEach() and indexed for loops
Use optional chaining (?.) and nullish coalescing (??) for safer property access
Prefer template literals over string concatenation
Use destructuring for object and array assignments
Use const by default, let only when reassignment is needed, never var
Always await promises in async functions - don't forget to use the return value
Use async/await syntax instead of promise chains for better readability
Handle errors appropriately in async code with try-catch blocks
Don't use async functions as Promise executors
Remove console.log, debugger, and alert statements from production code
Throw Error objects with descriptive messages, not strings or other values
Use try-catch blocks meaningfully - don't catch errors just to rethrow them
Prefer early returns over nested conditionals for error cases
Extract complex conditions into well-named boolean variables
Use early returns to reduce nesting
Prefer simple conditionals over nested ternary operators
Don't use eval() or assign directly to document.cookie
Avoid spread syntax in accumulators within loops
Use top-level regex literals instead of creating them in loops
Prefer specific imports over namespace imports
Use descriptive names for functions, variables, and types for meaningful naming
Add comments for complex logic, but prefer self-documenting code

Files:

  • packages/leadtype/src/cli/watch.test.ts
  • packages/leadtype/src/cli/watch.ts
  • packages/leadtype/src/convert/incremental.test.ts
  • packages/leadtype/src/convert/incremental.ts
  • packages/leadtype/src/markdown/plugins/type-table.ts
  • packages/leadtype/src/convert/convert.ts
  • packages/leadtype/src/cli/generate.ts
**/*.{test,spec}.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{test,spec}.{js,ts,jsx,tsx}: Write assertions inside it() or test() blocks
Avoid done callbacks in async tests - use async/await instead
Don't use .only or .skip in committed code
Keep test suites reasonably flat - avoid excessive describe nesting

Files:

  • packages/leadtype/src/cli/watch.test.ts
  • packages/leadtype/src/convert/incremental.test.ts
🧠 Learnings (1)
📚 Learning: 2026-06-09T18:30:08.038Z
Learnt from: KayleeWilliams
Repo: inthhq/leadtype PR: 97
File: .changeset/search-prototype-safety-and-scaling.md:5-5
Timestamp: 2026-06-09T18:30:08.038Z
Learning: In this repo, `.changeset/*.md` files must not start the body with an H1/first-line heading (`#`) immediately after the YAML frontmatter. The changesets tool inlines the body as bullet entries into `CHANGELOG.md` during release, and a leading `#` heading would break the generated changelog format. As a result, MD041 (`first-line-heading`) warnings for files under `.changeset/` are expected false positives and should be ignored.

Applied to files:

  • .changeset/watch-mode-incremental-builds.md
🪛 ast-grep (0.44.0)
packages/leadtype/src/convert/convert.ts

[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)

🪛 markdownlint-cli2 (0.22.1)
.changeset/watch-mode-incremental-builds.md

[warning] 5-5: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

🔇 Additional comments (13)
.changeset/watch-mode-incremental-builds.md (1)

1-12: 📐 Maintainability & Code Quality

MD041 warning is an expected false positive for this repo's changeset files.

Based on learnings, .changeset/*.md files intentionally must not start with an H1 heading immediately after frontmatter, since the changesets tool inlines the body as bullet entries into CHANGELOG.md; a leading # would break that format. The static-analysis MD041 hint here should be ignored.

Source: Learnings

packages/leadtype/src/cli/watch.ts (1)

1-133: LGTM!

packages/leadtype/src/cli/watch.test.ts (1)

1-158: LGTM!

packages/leadtype/src/cli/generate.ts (5)

1-11: LGTM!

Also applies to: 77-77


183-186: LGTM!

Also applies to: 354-366, 396-397, 445-448


2401-2413: LGTM!


2536-2540: LGTM!

Also applies to: 2609-2634, 2694-2699, 2721-2721, 2779-2789, 3161-3165


2414-2430: 🗄️ Data Integrity & Integration

Drop the self-reference warning for leadtype/package.json. packages/leadtype/package.json already exports "./package.json", so createRequire(import.meta.url) can resolve it and this fallback won’t be triggered by package exports alone.

			> Likely an incorrect or invalid review comment.
docs/reference/cli.mdx (1)

60-79: LGTM!

packages/leadtype/src/convert/convert.ts (2)

1458-1497: 🎯 Functional Correctness

Covered by the pruning-history comment above.

The entry.output traversal-guard here is solid, but this function's inputs need the fix proposed on Lines 1270-1284 (using history-preserving prune candidates and per-entry validation) to avoid silently losing prune opportunities and to avoid crashing on malformed manifest entries.


3-3: LGTM!

Also applies to: 37-47, 347-352, 374-379, 842-852, 1286-1356, 1401-1452

packages/leadtype/src/convert/incremental.ts (1)

1-77: LGTM!

Also applies to: 118-147

packages/leadtype/src/markdown/plugins/type-table.ts (1)

192-212: LGTM!

Also applies to: 236-255, 910-932, 1091-1113

Comment on lines +2432 to +2483
/**
* Build the incremental-cache options for this run, or undefined when there
* is no conventional cache home (`node_modules/`) to write into.
*
* The fingerprint invalidates the whole cache when anything the per-file
* checks can't see changes: the leadtype version, the docs config file
* (transformers/flatteners are functions, so the config's content hash
* stands in for them), or flags that alter conversion output.
*/
async function resolveConvertCache(opts: {
srcDir: string;
outDir: string;
args: GenerateArgs;
configPath?: string;
}): Promise<ConvertCacheOptions | undefined> {
const nodeModulesDir = path.join(opts.srcDir, "node_modules");
if (!existsSync(nodeModulesDir)) {
return;
}
const cacheId = hashText(
JSON.stringify({
outDir: opts.outDir,
docsDirs: opts.args.docsDirs,
include: opts.args.include,
exclude: opts.args.exclude,
bundle: opts.args.bundle,
})
).slice(0, 16);
let configHash = "no-config";
if (opts.configPath) {
try {
configHash = hashText(await readFile(opts.configPath, "utf8"));
} catch {
configHash = "unreadable-config";
}
}
const fingerprint = [
readLeadtypeVersion(),
configHash,
JSON.stringify({ enrichGit: opts.args.enrichGit }),
].join("|");
return {
file: path.join(
nodeModulesDir,
".cache",
"leadtype",
`generate-${cacheId}.json`
),
fingerprint,
...(opts.args.force ? { force: true } : {}),
};
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Cache silently disabled for hoisted monorepo node_modules layouts.

resolveConvertCache only checks existsSync(path.join(opts.srcDir, "node_modules")). In workspace setups where dependencies are hoisted and --src points at a subpackage without its own node_modules, incremental caching is disabled entirely, even though a usable node_modules exists at an ancestor directory. This is documented as expected behavior in cli.mdx, but a short upward walk (similar to Node's own module resolution) would extend the feature's coverage to more repo layouts.

♻️ Suggested upward-walk helper
+function findNearestNodeModules(startDir: string): string | undefined {
+  let dir = startDir;
+  for (;;) {
+    const candidate = path.join(dir, "node_modules");
+    if (existsSync(candidate)) {
+      return candidate;
+    }
+    const parent = path.dirname(dir);
+    if (parent === dir) {
+      return undefined;
+    }
+    dir = parent;
+  }
+}
+
 async function resolveConvertCache(opts: {
   srcDir: string;
   outDir: string;
   args: GenerateArgs;
   configPath?: string;
 }): Promise<ConvertCacheOptions | undefined> {
-  const nodeModulesDir = path.join(opts.srcDir, "node_modules");
-  if (!existsSync(nodeModulesDir)) {
+  const nodeModulesDir = findNearestNodeModules(opts.srcDir);
+  if (!nodeModulesDir) {
     return;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Build the incremental-cache options for this run, or undefined when there
* is no conventional cache home (`node_modules/`) to write into.
*
* The fingerprint invalidates the whole cache when anything the per-file
* checks can't see changes: the leadtype version, the docs config file
* (transformers/flatteners are functions, so the config's content hash
* stands in for them), or flags that alter conversion output.
*/
async function resolveConvertCache(opts: {
srcDir: string;
outDir: string;
args: GenerateArgs;
configPath?: string;
}): Promise<ConvertCacheOptions | undefined> {
const nodeModulesDir = path.join(opts.srcDir, "node_modules");
if (!existsSync(nodeModulesDir)) {
return;
}
const cacheId = hashText(
JSON.stringify({
outDir: opts.outDir,
docsDirs: opts.args.docsDirs,
include: opts.args.include,
exclude: opts.args.exclude,
bundle: opts.args.bundle,
})
).slice(0, 16);
let configHash = "no-config";
if (opts.configPath) {
try {
configHash = hashText(await readFile(opts.configPath, "utf8"));
} catch {
configHash = "unreadable-config";
}
}
const fingerprint = [
readLeadtypeVersion(),
configHash,
JSON.stringify({ enrichGit: opts.args.enrichGit }),
].join("|");
return {
file: path.join(
nodeModulesDir,
".cache",
"leadtype",
`generate-${cacheId}.json`
),
fingerprint,
...(opts.args.force ? { force: true } : {}),
};
}
/**
* Build the incremental-cache options for this run, or undefined when there
* is no conventional cache home (`node_modules/`) to write into.
*
* The fingerprint invalidates the whole cache when anything the per-file
* checks can't see changes: the leadtype version, the docs config file
* (transformers/flatteners are functions, so the config's content hash
* stands in for them), or flags that alter conversion output.
*/
function findNearestNodeModules(startDir: string): string | undefined {
let dir = startDir;
for (;;) {
const candidate = path.join(dir, "node_modules");
if (existsSync(candidate)) {
return candidate;
}
const parent = path.dirname(dir);
if (parent === dir) {
return undefined;
}
dir = parent;
}
}
async function resolveConvertCache(opts: {
srcDir: string;
outDir: string;
args: GenerateArgs;
configPath?: string;
}): Promise<ConvertCacheOptions | undefined> {
const nodeModulesDir = findNearestNodeModules(opts.srcDir);
if (!nodeModulesDir) {
return;
}
const cacheId = hashText(
JSON.stringify({
outDir: opts.outDir,
docsDirs: opts.args.docsDirs,
include: opts.args.include,
exclude: opts.args.exclude,
bundle: opts.args.bundle,
})
).slice(0, 16);
let configHash = "no-config";
if (opts.configPath) {
try {
configHash = hashText(await readFile(opts.configPath, "utf8"));
} catch {
configHash = "unreadable-config";
}
}
const fingerprint = [
readLeadtypeVersion(),
configHash,
JSON.stringify({ enrichGit: opts.args.enrichGit }),
].join("|");
return {
file: path.join(
nodeModulesDir,
".cache",
"leadtype",
`generate-${cacheId}.json`
),
fingerprint,
...(opts.args.force ? { force: true } : {}),
};
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/leadtype/src/cli/generate.ts` around lines 2432 - 2483, The
incremental cache is only enabled when resolveConvertCache finds a node_modules
directly under srcDir, which disables caching for hoisted workspace layouts.
Update resolveConvertCache in generate.ts to locate the nearest usable
node_modules by walking up from opts.srcDir (Node-style resolution) instead of
checking only the immediate child, while keeping the existing
cacheId/fingerprint logic and the ConvertCacheOptions file path generation
intact.

Comment on lines +2547 to +2607
async function runGenerateWatch(
args: GenerateArgs,
io: GenerateIo
): Promise<number> {
const outDir = path.resolve(args.outDir);
let outcome = await executeGenerate(args, io);
let watcher: ReturnType<typeof watchInputs> | undefined;
let running = false;
let queued = false;

const armWatcher = (): ReturnType<typeof watchInputs> =>
watchInputs({
paths: outcome.watchPaths,
ignorePaths: [outDir],
onChange: (changedPaths) => {
logger.info({
human: {
message: `change detected (${changedPaths.length} path${changedPaths.length === 1 ? "" : "s"}) — regenerating`,
},
json: {
event: "generate.watch_rebuild",
fields: { changed: changedPaths.slice(0, 20) },
},
});
scheduleRerun();
},
});

const scheduleRerun = (): void => {
rerun().catch((error: unknown) => {
const message = error instanceof Error ? error.message : String(error);
logger.error({
human: { message: `watch rebuild failed: ${message}` },
json: { event: "generate.watch_error", fields: { message } },
});
});
};

const rerun = async (): Promise<void> => {
if (running) {
queued = true;
return;
}
running = true;
try {
const previousWatchPaths = outcome.watchPaths.join("\0");
outcome = await executeGenerate(args, io);
// A config edit can add or remove docs sources — re-arm so the new
// set is observed.
if (outcome.watchPaths.join("\0") !== previousWatchPaths) {
watcher?.close();
watcher = armWatcher();
}
} finally {
running = false;
if (queued) {
queued = false;
scheduleRerun();
}
}
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

--force bypasses the cache on every watch-triggered rebuild, not just the initial run.

args is never mutated between reruns, so combining --watch --force causes resolveConvertCache to set force: true on every single rerun triggered by scheduleRerun, defeating the fast incremental-rebuild benefit that watch mode is meant to provide after the first build. Consider clearing args.force after the first successful run, or documenting this combination's behavior explicitly in GENERATE_USAGE/cli.mdx.

🩹 Suggested one-line fix
   const outDir = path.resolve(args.outDir);
   let outcome = await executeGenerate(args, io);
+  // Only force the very first build; watch-triggered rebuilds should stay
+  // incremental even when the session started with --force.
+  args = { ...args, force: false };
   let watcher: ReturnType<typeof watchInputs> | undefined;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function runGenerateWatch(
args: GenerateArgs,
io: GenerateIo
): Promise<number> {
const outDir = path.resolve(args.outDir);
let outcome = await executeGenerate(args, io);
let watcher: ReturnType<typeof watchInputs> | undefined;
let running = false;
let queued = false;
const armWatcher = (): ReturnType<typeof watchInputs> =>
watchInputs({
paths: outcome.watchPaths,
ignorePaths: [outDir],
onChange: (changedPaths) => {
logger.info({
human: {
message: `change detected (${changedPaths.length} path${changedPaths.length === 1 ? "" : "s"}) — regenerating`,
},
json: {
event: "generate.watch_rebuild",
fields: { changed: changedPaths.slice(0, 20) },
},
});
scheduleRerun();
},
});
const scheduleRerun = (): void => {
rerun().catch((error: unknown) => {
const message = error instanceof Error ? error.message : String(error);
logger.error({
human: { message: `watch rebuild failed: ${message}` },
json: { event: "generate.watch_error", fields: { message } },
});
});
};
const rerun = async (): Promise<void> => {
if (running) {
queued = true;
return;
}
running = true;
try {
const previousWatchPaths = outcome.watchPaths.join("\0");
outcome = await executeGenerate(args, io);
// A config edit can add or remove docs sources — re-arm so the new
// set is observed.
if (outcome.watchPaths.join("\0") !== previousWatchPaths) {
watcher?.close();
watcher = armWatcher();
}
} finally {
running = false;
if (queued) {
queued = false;
scheduleRerun();
}
}
};
async function runGenerateWatch(
args: GenerateArgs,
io: GenerateIo
): Promise<number> {
const outDir = path.resolve(args.outDir);
let outcome = await executeGenerate(args, io);
// Only force the very first build; watch-triggered rebuilds should stay
// incremental even when the session started with --force.
args = { ...args, force: false };
let watcher: ReturnType<typeof watchInputs> | undefined;
let running = false;
let queued = false;
const armWatcher = (): ReturnType<typeof watchInputs> =>
watchInputs({
paths: outcome.watchPaths,
ignorePaths: [outDir],
onChange: (changedPaths) => {
logger.info({
human: {
message: `change detected (${changedPaths.length} path${changedPaths.length === 1 ? "" : "s"}) — regenerating`,
},
json: {
event: "generate.watch_rebuild",
fields: { changed: changedPaths.slice(0, 20) },
},
});
scheduleRerun();
},
});
const scheduleRerun = (): void => {
rerun().catch((error: unknown) => {
const message = error instanceof Error ? error.message : String(error);
logger.error({
human: { message: `watch rebuild failed: ${message}` },
json: { event: "generate.watch_error", fields: { message } },
});
});
};
const rerun = async (): Promise<void> => {
if (running) {
queued = true;
return;
}
running = true;
try {
const previousWatchPaths = outcome.watchPaths.join("\0");
outcome = await executeGenerate(args, io);
// A config edit can add or remove docs sources — re-arm so the new
// set is observed.
if (outcome.watchPaths.join("\0") !== previousWatchPaths) {
watcher?.close();
watcher = armWatcher();
}
} finally {
running = false;
if (queued) {
queued = false;
scheduleRerun();
}
}
};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/leadtype/src/cli/generate.ts` around lines 2547 - 2607, The watch
rerun path in runGenerateWatch keeps reusing the same args object, so --force
remains enabled for every rebuild and bypasses the cache indefinitely. Update
the watch flow so the force flag is only honored for the initial executeGenerate
call, then clear or override args.force before subsequent reruns triggered by
scheduleRerun/rerun. Make the change in runGenerateWatch and keep the behavior
aligned with resolveConvertCache so incremental watch rebuilds can use the cache
after the first pass.

Comment on lines +1270 to +1284
const cacheState = config.cache
? {
options: config.cache,
previousEntries: config.cache.force
? {}
: ((
await loadConvertCacheManifest(
config.cache.file,
config.cache.fingerprint
)
)?.entries ?? {}),
nextEntries: {} as Record<string, ConvertCacheEntry>,
hashCache: createFileHashCache(),
}
: undefined;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Pruning history is permanently lost whenever the cache is invalidated (--force or a fingerprint change).

previousEntries is used for two different purposes: (1) deciding whether a file can be reused, and (2) knowing which previously-recorded outputs to prune via pruneStaleOutputs (called later at line ~1408 with this same previousEntries). When config.cache.force is true, or whenever loadConvertCacheManifest returns null due to a fingerprint mismatch (e.g. a leadtype version bump — see incremental.ts:109), previousEntries becomes {}.

Since nextEntries (built from only the current source files) is what gets written back to the manifest at the end of the run, any entry for a file that was deleted before this force/fingerprint-invalidated run is dropped from the manifest with no chance to prune its stale output — and that history is now gone forever, since the next manifest write no longer contains it. In practice: any leadtype version bump combined with a previously-removed doc leaves an orphaned .md file in outDir permanently.

♻️ Proposed fix: load prune candidates independently of `force`/fingerprint
+// in incremental.ts
+export async function loadConvertCacheEntriesForPrune(
+  filePath: string
+): Promise<Record<string, ConvertCacheEntry>> {
+  let raw: string;
+  try {
+    raw = await readFile(filePath, "utf8");
+  } catch {
+    return {};
+  }
+  try {
+    const parsed: unknown = JSON.parse(raw);
+    if (isManifestShape(parsed)) {
+      return parsed.entries;
+    }
+  } catch {
+    // Corrupt manifest — nothing to prune from.
+  }
+  return {};
+}
   const cacheState = config.cache
     ? {
         options: config.cache,
         previousEntries: config.cache.force
           ? {}
           : ((
               await loadConvertCacheManifest(
                 config.cache.file,
                 config.cache.fingerprint
               )
             )?.entries ?? {}),
+        pruneEntries: await loadConvertCacheEntriesForPrune(config.cache.file),
         nextEntries: {} as Record<string, ConvertCacheEntry>,
         hashCache: createFileHashCache(),
       }
     : undefined;
     pruned = await pruneStaleOutputs(
-      cacheState.previousEntries,
+      cacheState.pruneEntries,
       cacheState.nextEntries,
       currentKeys,
       outDir
     );

Also worth adding a regression test in incremental.test.ts for "prunes a deleted source's output across a force/fingerprint-invalidated run" — none of the current tests combine deletion with force/fingerprint changes.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const cacheState = config.cache
? {
options: config.cache,
previousEntries: config.cache.force
? {}
: ((
await loadConvertCacheManifest(
config.cache.file,
config.cache.fingerprint
)
)?.entries ?? {}),
nextEntries: {} as Record<string, ConvertCacheEntry>,
hashCache: createFileHashCache(),
}
: undefined;
const cacheState = config.cache
? {
options: config.cache,
previousEntries: config.cache.force
? {}
: ((
await loadConvertCacheManifest(
config.cache.file,
config.cache.fingerprint
)
)?.entries ?? {}),
pruneEntries: await loadConvertCacheEntriesForPrune(config.cache.file),
nextEntries: {} as Record<string, ConvertCacheEntry>,
hashCache: createFileHashCache(),
}
: undefined;
🧰 Tools
🪛 ast-grep (0.44.0)

[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/leadtype/src/convert/convert.ts` around lines 1270 - 1284, The cache
manifest handling in convert.ts is dropping prune history whenever
config.cache.force is set or loadConvertCacheManifest returns null on a
fingerprint mismatch, which causes stale outputs to survive forever. Update the
cacheState setup so prune candidates are loaded independently of
force/fingerprint invalidation, and keep a separate history source for
pruneStaleOutputs instead of reusing previousEntries for both reuse checks and
pruning. Add a regression test in incremental.test.ts covering a deleted source
whose output is pruned across a force run or fingerprint-invalidated run.

Comment on lines +1357 to +1372
if (cacheState && sourceHash) {
const deps: Record<string, string> = {};
for (const dependencyPath of dependencyPaths ?? []) {
// Unreadable/missing deps record an empty hash: the entry stays
// invalid until the file appears, so a broken reference that gets
// fixed later is picked up without --force.
deps[dependencyPath] =
(await hashFileCached(dependencyPath, cacheState.hashCache)) ?? "";
}
cacheState.nextEntries[cacheKey] = {
sourceHash,
deps,
enrichment: enrichmentKey,
output: normalizeRelativePath(relative(outDir, outputPath)),
};
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Sequential dependency hashing — consider parallelizing like depsUnchanged.

Each dependency path is hashed one-at-a-time via a for...of + await loop, unlike depsUnchanged which hashes all deps in parallel via Promise.all. For files with many dependencies (e.g. type-table pages pulling in several TypeScript sources), this serializes I/O unnecessarily.

⚡ Proposed fix
       if (cacheState && sourceHash) {
-        const deps: Record<string, string> = {};
-        for (const dependencyPath of dependencyPaths ?? []) {
-          // Unreadable/missing deps record an empty hash: the entry stays
-          // invalid until the file appears, so a broken reference that gets
-          // fixed later is picked up without --force.
-          deps[dependencyPath] =
-            (await hashFileCached(dependencyPath, cacheState.hashCache)) ?? "";
-        }
+        const deps = Object.fromEntries(
+          await Promise.all(
+            Array.from(dependencyPaths ?? [], async (dependencyPath) => [
+              dependencyPath,
+              // Unreadable/missing deps record an empty hash: the entry
+              // stays invalid until the file appears, so a broken reference
+              // that gets fixed later is picked up without --force.
+              (await hashFileCached(dependencyPath, cacheState.hashCache)) ??
+                "",
+            ])
+          )
+        );
         cacheState.nextEntries[cacheKey] = {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (cacheState && sourceHash) {
const deps: Record<string, string> = {};
for (const dependencyPath of dependencyPaths ?? []) {
// Unreadable/missing deps record an empty hash: the entry stays
// invalid until the file appears, so a broken reference that gets
// fixed later is picked up without --force.
deps[dependencyPath] =
(await hashFileCached(dependencyPath, cacheState.hashCache)) ?? "";
}
cacheState.nextEntries[cacheKey] = {
sourceHash,
deps,
enrichment: enrichmentKey,
output: normalizeRelativePath(relative(outDir, outputPath)),
};
}
if (cacheState && sourceHash) {
const deps = Object.fromEntries(
await Promise.all(
Array.from(dependencyPaths ?? [], async (dependencyPath) => [
dependencyPath,
// Unreadable/missing deps record an empty hash: the entry
// stays invalid until the file appears, so a broken reference
// that gets fixed later is picked up without --force.
(await hashFileCached(dependencyPath, cacheState.hashCache)) ??
"",
])
)
);
cacheState.nextEntries[cacheKey] = {
sourceHash,
deps,
enrichment: enrichmentKey,
output: normalizeRelativePath(relative(outDir, outputPath)),
};
}
🧰 Tools
🪛 ast-grep (0.44.0)

[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { execFile } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/leadtype/src/convert/convert.ts` around lines 1357 - 1372, The
dependency hashing in convert() is still done sequentially inside the
cacheState.nextEntries block, which unnecessarily serializes I/O for
dependencyPaths. Update the hashing logic near deps and hashFileCached to mirror
depsUnchanged by collecting all dependency hash promises and awaiting them
together with Promise.all, then build the deps record from the resolved results
while preserving the empty-string fallback for unreadable or missing files.

Comment on lines +1 to +246
import { existsSync } from "node:fs";
import { mkdir, mkdtemp, readFile, rm, writeFile } from "node:fs/promises";
import { tmpdir } from "node:os";
import path from "node:path";
import { afterEach, describe, expect, it } from "vitest";
import { includeMarkdown } from "../markdown";
import { convertAllMdx } from "./convert";
import { loadConvertCacheManifest } from "./incremental";

const tempDirs: string[] = [];
const TAMPER_MARKER = "<!-- tampered by test -->";

afterEach(async () => {
await Promise.all(
tempDirs.splice(0).map(async (dir) => {
await rm(dir, { recursive: true, force: true });
})
);
});

type CacheProject = {
root: string;
srcDir: string;
outDir: string;
cacheFile: string;
};

async function createCacheProject(): Promise<CacheProject> {
const root = await mkdtemp(path.join(tmpdir(), "leadtype-incremental-"));
tempDirs.push(root);
const srcDir = path.join(root, "docs");
const outDir = path.join(root, "out");
await mkdir(srcDir, { recursive: true });
return {
root,
srcDir,
outDir,
cacheFile: path.join(root, "cache", "convert.json"),
};
}

function runConvert(
project: CacheProject,
overrides: { fingerprint?: string; force?: boolean } = {}
): Promise<void> {
return convertAllMdx({
srcDir: project.srcDir,
outDir: project.outDir,
markdownTransforms: [includeMarkdown],
cache: {
file: project.cacheFile,
fingerprint: overrides.fingerprint ?? "test-fingerprint",
...(overrides.force ? { force: true } : {}),
},
});
}

async function tamper(outputPath: string): Promise<void> {
const current = await readFile(outputPath, "utf8");
await writeFile(outputPath, `${current}\n${TAMPER_MARKER}\n`);
}

async function isTampered(outputPath: string): Promise<boolean> {
const current = await readFile(outputPath, "utf8");
return current.includes(TAMPER_MARKER);
}

describe("convertAllMdx incremental cache", () => {
it("skips unchanged files on the second run", async () => {
const project = await createCacheProject();
await writeFile(
path.join(project.srcDir, "alpha.mdx"),
"---\ntitle: Alpha\n---\n\n# Alpha\n"
);
await writeFile(
path.join(project.srcDir, "beta.mdx"),
"---\ntitle: Beta\n---\n\n# Beta\n"
);

await runConvert(project);
const alphaOut = path.join(project.outDir, "alpha.md");
const betaOut = path.join(project.outDir, "beta.md");
expect(existsSync(alphaOut)).toBe(true);
expect(existsSync(betaOut)).toBe(true);

// A cached skip must leave the existing output untouched — tampering the
// outputs makes a rewrite observable.
await tamper(alphaOut);
await tamper(betaOut);
await runConvert(project);
expect(await isTampered(alphaOut)).toBe(true);
expect(await isTampered(betaOut)).toBe(true);
});

it("rebuilds only the edited file", async () => {
const project = await createCacheProject();
const alphaSrc = path.join(project.srcDir, "alpha.mdx");
await writeFile(alphaSrc, "---\ntitle: Alpha\n---\n\n# Alpha\n");
await writeFile(
path.join(project.srcDir, "beta.mdx"),
"---\ntitle: Beta\n---\n\n# Beta\n"
);

await runConvert(project);
const alphaOut = path.join(project.outDir, "alpha.md");
const betaOut = path.join(project.outDir, "beta.md");
await tamper(alphaOut);
await tamper(betaOut);

await writeFile(alphaSrc, "---\ntitle: Alpha\n---\n\n# Alpha edited\n");
await runConvert(project);

expect(await isTampered(alphaOut)).toBe(false);
expect(await readFile(alphaOut, "utf8")).toContain("Alpha edited");
expect(await isTampered(betaOut)).toBe(true);
});

it("rebuilds a file when its include target changes", async () => {
const project = await createCacheProject();
const snippetPath = path.join(project.srcDir, "snippet.mdx");
await writeFile(snippetPath, "Shared snippet v1.\n");
await writeFile(
path.join(project.srcDir, "page.mdx"),
'---\ntitle: Page\n---\n\n<include src="./snippet.mdx" />\n'
);
await writeFile(
path.join(project.srcDir, "other.mdx"),
"---\ntitle: Other\n---\n\n# Other\n"
);

await runConvert(project);
const pageOut = path.join(project.outDir, "page.md");
const otherOut = path.join(project.outDir, "other.md");
expect(await readFile(pageOut, "utf8")).toContain("Shared snippet v1.");
await tamper(pageOut);
await tamper(otherOut);

await writeFile(snippetPath, "Shared snippet v2.\n");
await runConvert(project);

expect(await readFile(pageOut, "utf8")).toContain("Shared snippet v2.");
expect(await isTampered(pageOut)).toBe(false);
expect(await isTampered(otherOut)).toBe(true);
});

it("records include targets as dependencies in the manifest", async () => {
const project = await createCacheProject();
const snippetPath = path.join(project.srcDir, "snippet.mdx");
await writeFile(snippetPath, "Shared snippet.\n");
await writeFile(
path.join(project.srcDir, "page.mdx"),
'---\ntitle: Page\n---\n\n<include src="./snippet.mdx" />\n'
);

await runConvert(project);
const manifest = await loadConvertCacheManifest(
project.cacheFile,
"test-fingerprint"
);
expect(manifest).not.toBeNull();
const entry = manifest?.entries["page.mdx"];
expect(entry).toBeDefined();
expect(Object.keys(entry?.deps ?? {})).toContain(snippetPath);
});

it("prunes the output of a deleted source file", async () => {
const project = await createCacheProject();
const alphaSrc = path.join(project.srcDir, "alpha.mdx");
await writeFile(alphaSrc, "---\ntitle: Alpha\n---\n\n# Alpha\n");
await writeFile(
path.join(project.srcDir, "beta.mdx"),
"---\ntitle: Beta\n---\n\n# Beta\n"
);

await runConvert(project);
const alphaOut = path.join(project.outDir, "alpha.md");
expect(existsSync(alphaOut)).toBe(true);

await rm(alphaSrc);
await runConvert(project);

expect(existsSync(alphaOut)).toBe(false);
expect(existsSync(path.join(project.outDir, "beta.md"))).toBe(true);
});

it("rebuilds everything with force", async () => {
const project = await createCacheProject();
await writeFile(
path.join(project.srcDir, "alpha.mdx"),
"---\ntitle: Alpha\n---\n\n# Alpha\n"
);

await runConvert(project);
const alphaOut = path.join(project.outDir, "alpha.md");
await tamper(alphaOut);

await runConvert(project, { force: true });
expect(await isTampered(alphaOut)).toBe(false);
});

it("rebuilds everything when the fingerprint changes", async () => {
const project = await createCacheProject();
await writeFile(
path.join(project.srcDir, "alpha.mdx"),
"---\ntitle: Alpha\n---\n\n# Alpha\n"
);

await runConvert(project);
const alphaOut = path.join(project.outDir, "alpha.md");
await tamper(alphaOut);

await runConvert(project, { fingerprint: "different-fingerprint" });
expect(await isTampered(alphaOut)).toBe(false);
});

it("regenerates a missing output even when the source is unchanged", async () => {
const project = await createCacheProject();
await writeFile(
path.join(project.srcDir, "alpha.mdx"),
"---\ntitle: Alpha\n---\n\n# Alpha\n"
);

await runConvert(project);
const alphaOut = path.join(project.outDir, "alpha.md");
await rm(alphaOut);

await runConvert(project);
expect(existsSync(alphaOut)).toBe(true);
});

it("survives a corrupt manifest by rebuilding", async () => {
const project = await createCacheProject();
await writeFile(
path.join(project.srcDir, "alpha.mdx"),
"---\ntitle: Alpha\n---\n\n# Alpha\n"
);

await runConvert(project);
await writeFile(project.cacheFile, "{not json");
const alphaOut = path.join(project.outDir, "alpha.md");
await tamper(alphaOut);

await runConvert(project);
expect(await isTampered(alphaOut)).toBe(false);
});
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Good coverage of the core cache hit/miss/prune/force/fingerprint/corruption paths.

Missing test: pruning across a force/fingerprint-invalidated run. All prune-related assertions (Lines 166-184) run with the same fingerprint and without force, so they don't exercise the scenario where previousEntries is emptied by force/fingerprint change (see the major issue raised in convert.ts around cacheState initialization) — a deleted source's output would remain un-pruned and this suite wouldn't catch it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/leadtype/src/convert/incremental.test.ts` around lines 1 - 246, Add
a test in incremental.test.ts that covers pruning after a force run or
fingerprint change using createCacheProject and runConvert. The issue is that
the current prune test only exercises the normal cache path, so it misses the
case where cacheState.previousEntries is cleared by force/fingerprint
invalidation and deleted outputs may not be removed. Reuse the existing helpers
(runConvert, tamper, existsSync, rm) to delete a source after an initial
conversion, rerun with force or a different fingerprint, and assert the deleted
file’s output is pruned while the remaining output stays intact.

Comment on lines +79 to +116
function isManifestShape(value: unknown): value is ConvertCacheManifest {
if (typeof value !== "object" || value === null) {
return false;
}
const manifest = value as Record<string, unknown>;
return (
manifest.version === CONVERT_CACHE_VERSION &&
typeof manifest.fingerprint === "string" &&
typeof manifest.entries === "object" &&
manifest.entries !== null
);
}

/**
* Read a manifest, returning null when it's missing, unreadable, from a
* different cache version, or built with a different fingerprint — every
* miss degrades to a full rebuild, never an error.
*/
export async function loadConvertCacheManifest(
filePath: string,
fingerprint: string
): Promise<ConvertCacheManifest | null> {
let raw: string;
try {
raw = await readFile(filePath, "utf8");
} catch {
return null;
}
try {
const parsed: unknown = JSON.parse(raw);
if (isManifestShape(parsed) && parsed.fingerprint === fingerprint) {
return parsed;
}
} catch {
// Corrupt manifest — treat as absent.
}
return null;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

isManifestShape only validates top-level fields, not per-entry shape — malformed entries can crash pruning.

isManifestShape checks that entries is a non-null object but never validates that each ConvertCacheEntry actually has the expected sourceHash/deps/enrichment/output string/object fields. A manifest that's syntactically valid JSON but has a partially-written or hand-edited entry (e.g. missing output) will pass this check and be accepted as a real manifest.

Downstream, pruneStaleOutputs (convert.ts) calls resolve(outDir, entry.output) on every entry that isn't in the current key set — if entry.output is undefined/non-string, path.resolve throws TypeError, crashing the whole convertAllMdx() run. This directly contradicts the module's own documented guarantee that manifest issues should "degrade to a full rebuild, never an error."

🛡️ Proposed fix: validate entry shape in `isManifestShape`
 function isManifestShape(value: unknown): value is ConvertCacheManifest {
   if (typeof value !== "object" || value === null) {
     return false;
   }
   const manifest = value as Record<string, unknown>;
-  return (
+  if (
     manifest.version === CONVERT_CACHE_VERSION &&
     typeof manifest.fingerprint === "string" &&
     typeof manifest.entries === "object" &&
     manifest.entries !== null
-  );
+  ) {
+    return Object.values(
+      manifest.entries as Record<string, unknown>
+    ).every(
+      (entry) =>
+        typeof entry === "object" &&
+        entry !== null &&
+        typeof (entry as ConvertCacheEntry).sourceHash === "string" &&
+        typeof (entry as ConvertCacheEntry).output === "string" &&
+        typeof (entry as ConvertCacheEntry).enrichment === "string" &&
+        typeof (entry as ConvertCacheEntry).deps === "object" &&
+        (entry as ConvertCacheEntry).deps !== null
+    );
+  }
+  return false;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function isManifestShape(value: unknown): value is ConvertCacheManifest {
if (typeof value !== "object" || value === null) {
return false;
}
const manifest = value as Record<string, unknown>;
return (
manifest.version === CONVERT_CACHE_VERSION &&
typeof manifest.fingerprint === "string" &&
typeof manifest.entries === "object" &&
manifest.entries !== null
);
}
/**
* Read a manifest, returning null when it's missing, unreadable, from a
* different cache version, or built with a different fingerprint every
* miss degrades to a full rebuild, never an error.
*/
export async function loadConvertCacheManifest(
filePath: string,
fingerprint: string
): Promise<ConvertCacheManifest | null> {
let raw: string;
try {
raw = await readFile(filePath, "utf8");
} catch {
return null;
}
try {
const parsed: unknown = JSON.parse(raw);
if (isManifestShape(parsed) && parsed.fingerprint === fingerprint) {
return parsed;
}
} catch {
// Corrupt manifest — treat as absent.
}
return null;
}
function isManifestShape(value: unknown): value is ConvertCacheManifest {
if (typeof value !== "object" || value === null) {
return false;
}
const manifest = value as Record<string, unknown>;
if (
manifest.version === CONVERT_CACHE_VERSION &&
typeof manifest.fingerprint === "string" &&
typeof manifest.entries === "object" &&
manifest.entries !== null
) {
return Object.values(manifest.entries as Record<string, unknown>).every(
(entry) =>
typeof entry === "object" &&
entry !== null &&
typeof (entry as ConvertCacheEntry).sourceHash === "string" &&
typeof (entry as ConvertCacheEntry).output === "string" &&
typeof (entry as ConvertCacheEntry).enrichment === "string" &&
typeof (entry as ConvertCacheEntry).deps === "object" &&
(entry as ConvertCacheEntry).deps !== null
);
}
return false;
}
/**
* Read a manifest, returning null when it's missing, unreadable, from a
* different cache version, or built with a different fingerprint every
* miss degrades to a full rebuild, never an error.
*/
export async function loadConvertCacheManifest(
filePath: string,
fingerprint: string
): Promise<ConvertCacheManifest | null> {
let raw: string;
try {
raw = await readFile(filePath, "utf8");
} catch {
return null;
}
try {
const parsed: unknown = JSON.parse(raw);
if (isManifestShape(parsed) && parsed.fingerprint === fingerprint) {
return parsed;
}
} catch {
// Corrupt manifest — treat as absent.
}
return null;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/leadtype/src/convert/incremental.ts` around lines 79 - 116, The
manifest validator in isManifestShape only checks that entries exists as an
object, so malformed ConvertCacheEntry data can slip through and later crash
pruneStaleOutputs when it reads entry.output. Tighten isManifestShape in
incremental.ts to validate every entry’s shape, including required string/object
fields such as sourceHash, deps, enrichment, and output, and only return true
when all entries match ConvertCacheEntry. Keep loadConvertCacheManifest’s
behavior unchanged so invalid manifests still fall back to null and a full
rebuild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant