Summary
convertAllMdx writes a .md file into outDir for each source page, but it never removes outputs whose source was deleted or renamed. Over time outDir accumulates orphaned files that no longer correspond to any source page, and they keep getting served/published.
Repro
- Author
guides/old-thing.mdx, run convertAllMdx → outDir/guides/old-thing.md is written.
- Delete or rename
guides/old-thing.mdx, run convertAllMdx again.
outDir/guides/old-thing.md is still there. It's now a dead URL with stale content, and it leaks into anything that walks outDir (sitemaps, link checks, search indexing of the raw files, etc.).
Why it matters
Renames are common during docs work, and a stale .md is worse than a 404 — it's a live page with outdated content and no source of truth. Every consumer that wants a clean outDir has to implement the same garbage-collection step: compute the expected output set from the resolved pages, walk outDir, and delete any .md not in the set. That's easy to get wrong (e.g. accidentally deleting non-generated files that share the directory).
Current workaround (illustrative)
const expected = new Set(pages.map((p) => path.join(outDir, p.markdownPath)))
for (const file of await walk(outDir)) {
if (file.endsWith('.md') && !expected.has(file)) await rm(file, { force: true })
}
Proposed change
Add an opt-in prune option to convertAllMdx (default false to preserve current behavior):
await convertAllMdx({ srcDir, outDir, prune: true /* , remarkPlugins */ })
When prune: true, after writing the current set of outputs, remove any .md under outDir that wasn't produced by this run. leadtype already knows the authoritative set of output paths, so it can prune safely (and scope strictly to files it owns) in a way a consumer can't do as reliably from outside. Happy to PR.
Summary
convertAllMdxwrites a.mdfile intooutDirfor each source page, but it never removes outputs whose source was deleted or renamed. Over timeoutDiraccumulates orphaned files that no longer correspond to any source page, and they keep getting served/published.Repro
guides/old-thing.mdx, runconvertAllMdx→outDir/guides/old-thing.mdis written.guides/old-thing.mdx, runconvertAllMdxagain.outDir/guides/old-thing.mdis still there. It's now a dead URL with stale content, and it leaks into anything that walksoutDir(sitemaps, link checks, search indexing of the raw files, etc.).Why it matters
Renames are common during docs work, and a stale
.mdis worse than a 404 — it's a live page with outdated content and no source of truth. Every consumer that wants a cleanoutDirhas to implement the same garbage-collection step: compute the expected output set from the resolved pages, walkoutDir, and delete any.mdnot in the set. That's easy to get wrong (e.g. accidentally deleting non-generated files that share the directory).Current workaround (illustrative)
Proposed change
Add an opt-in
pruneoption toconvertAllMdx(defaultfalseto preserve current behavior):When
prune: true, after writing the current set of outputs, remove any.mdunderoutDirthat wasn't produced by this run. leadtype already knows the authoritative set of output paths, so it can prune safely (and scope strictly to files it owns) in a way a consumer can't do as reliably from outside. Happy to PR.