Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .changeset/concurrent-generate-safety.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
"leadtype": patch
---

Make generation safe to invoke concurrently against a shared `outDir`.

Parallel task graphs (lint, typecheck, and build each depending on "docs are
generated") used to race on the shared output directory, causing intermittent
partial reads, ENOENT on files another run had just replaced, and half-written
artifacts.

- Every generated artifact (converted `docs/*.md`, `llms.txt`, `llms-full.txt`,
search index, sitemaps, robots, feeds, MCP card, NLWeb, skills, sync
manifests) is now written to a temp sibling and atomically renamed into
place, so concurrent readers see the old content or the new content — never
a truncated file.
- Delete-then-recreate windows are gone: the agent-skills surface and mounted
markdown mirrors now write the new files first and prune stale ones after,
instead of `rm -rf`-ing a live directory before rebuilding it.
- `leadtype generate` runs are single-flight per output directory via a
cross-process lock stored under the system temp dir (keyed by the resolved
`--out` path). Concurrent invocations wait for the in-flight run. Abandoned
locks recover fast: interrupted runs (SIGINT/SIGTERM) release on the way
out, hard-killed runs are reclaimed as soon as their recorded pid is gone,
and unidentifiable locks are reclaimed after 10 minutes. Waiting runs fail
loudly after 15 minutes instead of hanging CI (`LEADTYPE_LOCK_TIMEOUT_MS`
overrides). Set `LEADTYPE_NO_LOCK=1` to opt out. Temp files leaked by a
hard-killed run are swept at the start of the next locked run.
8 changes: 8 additions & 0 deletions docs/changelog/0-4.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,11 @@ Leadtype 0.4 is the next minor release after the 0.3 line. These notes are being
- Batched Git frontmatter enrichment during `convertAllMdx`. When `enrichFrontmatterFromGit` is enabled, conversion now reads Git history once for the docs tree and maps results back to each converted file instead of spawning `git log` per file. In a 120-file synthetic docs benchmark, the Git metadata read dropped from ~2.36s of per-file process spawning to ~12ms for the batched read; full conversion with enrichment added ~27ms over no enrichment.
- Preserved best-effort Git behavior for shallow clones, missing Git, and untracked files. `lastModified` still comes from the latest file commit, while `lastAuthor` now falls back to the latest non-bot author when the newest commit was authored by automation.
- Cached repeated `<include>` and `<import>` resolution within a conversion run. Pages that reuse the same partial now share one raw file read and one parsed markdown AST, while section anchors such as `file.mdx#setup` still extract from cloned ASTs. The current Leadtype docs and c15t fixture do not contain repeated real include nodes, but a synthetic 200-page repeated-include benchmark cut include expansion from ~400ms to ~68ms.

## Concurrent generation safety

- Made generation safe to invoke concurrently against a shared `outDir`. Parallel task graphs where lint, typecheck, and build each depend on docs generation used to race on the output directory, causing intermittent partial reads, ENOENT on files another run had just replaced, and half-written artifacts.
- Every generated artifact — converted `docs/*.md`, `llms.txt`, `llms-full.txt`, the search index, sitemaps, robots, feeds, the MCP server card, NLWeb artifacts, skills, and sync manifests — is now written to a temp sibling and atomically renamed into place, so concurrent readers (including a sibling `tsc` or framework build reading `public/`) see the old content or the new content, never a truncated file.
- Removed delete-then-recreate windows: the agent-skills surface and mounted markdown mirrors write the new files first and prune stale entries after, instead of `rm -rf`-ing a live directory before rebuilding it.
- `leadtype generate` runs are now single-flight per output directory via a cross-process lock stored under the system temp dir, keyed by the resolved `--out` path. Concurrent invocations wait for the in-flight run. Abandoned locks recover fast: interrupted runs release on the way out, hard-killed runs are reclaimed as soon as their recorded process is gone, and unidentifiable locks are reclaimed after 10 minutes. Waiting runs fail loudly after 15 minutes instead of hanging CI (`LEADTYPE_LOCK_TIMEOUT_MS` overrides). Set `LEADTYPE_NO_LOCK=1` to opt out.
- Overhead is negligible: the atomic write adds one rename per file (~0.1–0.2ms) and the lock a fixed ~8ms per run — about 1–2% end to end on a 300-page site.
11 changes: 11 additions & 0 deletions docs/pipeline/generate-static-artifacts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,17 @@ npx leadtype lint docs --error-unknown
npx leadtype generate --src . --out public --base-url https://docs.example.com
```

## Concurrent invocation

Parallel task graphs often invoke generation more than once at the same time — lint, typecheck, and build each declaring "docs are generated" as a prerequisite. `leadtype generate` is safe under that pattern:

- **Runs are single-flight per output directory.** A cross-process lock (keyed by the resolved `--out` path, stored under the system temp dir — never inside the output directory) serializes concurrent runs. Later invocations wait for the in-flight run, then regenerate. Abandoned locks recover fast: an interrupted run (Ctrl-C, `SIGTERM`) releases its lock on the way out, a hard-killed run's lock is reclaimed as soon as its recorded process is gone, and an unidentifiable lock is reclaimed after 10 minutes without refresh. Waiting runs fail loudly after 15 minutes rather than hanging CI (`LEADTYPE_LOCK_TIMEOUT_MS` overrides this for very large sites).
- **Every artifact is replaced atomically.** Files are written to a temp sibling and renamed into place, so a sibling build step reading the output directory (`tsc`, `next build`, a dev server watching `public/`) sees the old artifact or the new one — never a truncated file, and never a missing file for pages that still exist. Temp files leaked by a hard-killed run are swept at the start of the next locked run, so they never linger in a deployed `public/`.

Set `LEADTYPE_NO_LOCK=1` to skip the lock — for example on network filesystems where directory-based locking is unreliable, or when your task runner already serializes generation.

Even though concurrent runs are safe, they are redundant: each waits its turn and regenerates the same output. If your task graph supports it, prefer a single generation task that lint, typecheck, and build all depend on.

## Use library APIs for custom pipelines

The CLI is the happy path. Use the library APIs directly when you need custom plugin order, filters, or generated JSON paths. Keep conversion first — LLM files, search, and Agent Readability artifacts read the generated markdown:
Expand Down
104 changes: 99 additions & 5 deletions packages/leadtype/src/cli/generate.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { existsSync } from "node:fs";
import { cp, mkdir, mkdtemp, readFile, rm, writeFile } from "node:fs/promises";
import { cp, mkdir, mkdtemp, readFile, rm, rmdir } from "node:fs/promises";
import { tmpdir } from "node:os";
import path from "node:path";
import { pathToFileURL } from "node:url";
Expand All @@ -8,13 +8,22 @@ import type { Pluggable, PluggableList } from "unified";
import { convertAllMdx } from "../convert";
import { type DocsFeedConfig, generateFeedArtifacts } from "../feed";
import { type DocsI18nManifest, normalizeDocsI18nConfig } from "../i18n";
import {
copyFileAtomic,
sweepLeakedTempFiles,
writeFileAtomic,
} from "../internal/atomic-fs";
import {
type DocsPathMount,
normalizeBaseUrl,
normalizeDocsPath,
normalizeUrlPrefix,
} from "../internal/docs-url";
import { parseFrontmatter } from "../internal/frontmatter";
import {
acquireGenerateLock,
type GenerateLock,
} from "../internal/generate-lock";
import {
logger,
setLogFormat,
Expand Down Expand Up @@ -2142,24 +2151,82 @@ async function copyMountedMarkdownMirrors(
`Mounted URL prefix "${urlPrefix}" must resolve inside the output directory.`
);
}
await rm(targetDir, { force: true, recursive: true });
// A mount whose urlPrefix resolves inside its own source subtree (e.g.
// pathPrefix "guides" with urlPrefix "/docs/guides/public") nests
// targetDir under sourceDir. Exclude the mirror from the source glob so
// a previous run's mirror output is never re-mirrored into itself.
const targetRelativeToSource = path.relative(sourceDir, targetDir);
const targetInsideSource =
targetRelativeToSource.length > 0 &&
!targetRelativeToSource.startsWith("..") &&
!path.isAbsolute(targetRelativeToSource);
const files = await fg("**/*.md", {
absolute: false,
cwd: sourceDir,
ignore: targetInsideSource
? [`${normalizeDocsPath(targetRelativeToSource)}/**`]
: [],
onlyFiles: true,
});
await Promise.all(
files.map(async (file) => {
const sourcePath = path.join(sourceDir, file);
const targetPath = path.join(targetDir, file);
await mkdir(path.dirname(targetPath), { recursive: true });
await cp(sourcePath, targetPath);
await copyFileAtomic(sourcePath, targetPath);
})
);
// Prune mirror files whose source pages no longer exist. Pruning after
// the copy (instead of rm -rf on the whole mirror before it) keeps the
// mirror readable throughout — a concurrent reader never sees the
// directory disappear mid-generation.
const currentFiles = new Set(files);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent mirrors from reading their own output

When a mount's urlPrefix resolves inside its own source subtree, such as pathPrefix: "guides" with urlPrefix: "/docs/guides/public", leaving the mirror in place means the earlier glob over sourceDir also sees files from the previous mirror. This keep-set then treats those mirror files as current and the copy step writes them under public/public/... on every run; the old rm(targetDir) avoided that, so exclude targetDir from the source glob or reject nested targets before pruning.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6b4dcc3. When the mirror target resolves inside its own source subtree, the target is now excluded from the source glob (ignore: ["<target>/**"]), so a previous run's mirror output is never picked up as source content and re-mirrored into public/public/....

const mirroredFiles = await fg("**/*.md", {
absolute: false,
cwd: targetDir,
onlyFiles: true,
});
const staleFiles = mirroredFiles.filter(
(file) => !currentFiles.has(file)
);
await Promise.all(
staleFiles.map((file) =>
rm(path.join(targetDir, file), { force: true })
)
);
await removeEmptyMirrorDirs(targetDir, staleFiles);
})
);
}

/**
* Remove directories left empty after pruning stale mirror files, walking
* each pruned file's parent chain up to (but never including) the mirror
* root. A non-empty directory stops the walk — everything above it is
* non-empty too.
*/
async function removeEmptyMirrorDirs(
targetDir: string,
prunedFiles: string[]
): Promise<void> {
const parents = new Set(
prunedFiles.map((file) => path.dirname(path.join(targetDir, file)))
);
for (const parent of [...parents].sort(
(left, right) => right.length - left.length
)) {
let current = parent;
while (current.startsWith(`${targetDir}${path.sep}`)) {
try {
await rmdir(current);
} catch {
break;
}
current = path.dirname(current);
}
}
}

async function hasMarkdownFiles(dir: string): Promise<boolean> {
if (!existsSync(dir)) {
return false;
Expand Down Expand Up @@ -2203,7 +2270,7 @@ async function copyDefaultLocaleMarkdownAliases(
const sourcePath = path.join(defaultLocaleDir, file);
const targetPath = path.join(docsDir, file);
await mkdir(path.dirname(targetPath), { recursive: true });
await cp(sourcePath, targetPath);
await copyFileAtomic(sourcePath, targetPath);
})
);
await rm(defaultLocaleDir, { force: true, recursive: true });
Expand Down Expand Up @@ -2248,7 +2315,7 @@ async function writeI18nManifest(
}
const outputPath = path.join(outDir, DEFAULT_DOCS_DIR, "i18n-manifest.json");
await mkdir(path.dirname(outputPath), { recursive: true });
await writeFile(outputPath, `${JSON.stringify(manifest, null, 2)}\n`);
await writeFileAtomic(outputPath, `${JSON.stringify(manifest, null, 2)}\n`);
return outputPath;
}

Expand Down Expand Up @@ -2512,8 +2579,34 @@ export async function runGenerateCommand(
return 1;
}

// Serialize concurrent generate runs targeting the same outDir (parallel CI
// task graphs commonly fan out lint/typecheck/build, each regenerating docs).
// Atomic per-file writes keep individual artifacts readable at all times;
// the lock keeps whole runs from interleaving their read-back phases.
let generateLock: GenerateLock | undefined;
if (process.env.LEADTYPE_NO_LOCK !== "1") {
try {
const waitTimeoutMs = Number(process.env.LEADTYPE_LOCK_TIMEOUT_MS);
generateLock = await acquireGenerateLock(
outDir,
Number.isFinite(waitTimeoutMs) && waitTimeoutMs > 0
? { waitTimeoutMs }
: {}
);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
reportFailure(message);
return 1;
}
}

let sourceMirror: SourceMirror | undefined;
try {
if (generateLock) {
// With the lock held, no other run is in flight — safe to sweep temp
// files leaked into the output tree by a previous hard-killed run.
await sweepLeakedTempFiles(outDir);
}
const metadata = await resolveGenerateMetadata(
srcDir,
loadedConfig,
Expand Down Expand Up @@ -2944,6 +3037,7 @@ export async function runGenerateCommand(
return 1;
} finally {
await sourceMirror?.cleanup();
await generateLock?.release();
}
return 0;
}
95 changes: 95 additions & 0 deletions packages/leadtype/src/convert/convert-concurrency.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
import {
mkdir,
mkdtemp,
readdir,
readFile,
rm,
writeFile,
} from "node:fs/promises";
import { tmpdir } from "node:os";
import path from "node:path";
import { afterEach, describe, expect, it } from "vitest";
import { convertAllMdx } from "./convert";

const tempDirs: string[] = [];

async function createTempProject(): Promise<string> {
const dir = await mkdtemp(path.join(tmpdir(), "leadtype-convert-race-"));
tempDirs.push(dir);
return dir;
}

afterEach(async () => {
await Promise.all(
tempDirs.splice(0).map((dir) => rm(dir, { force: true, recursive: true }))
);
});

describe("convertAllMdx concurrency", () => {
it("keeps every output complete while concurrent runs share an outDir", async () => {
const dir = await createTempProject();
const srcDir = path.join(dir, "docs");
const outDir = path.join(dir, "public", "docs");
const fileCount = 24;
const concurrentRuns = 3;
// Large enough that a truncating write would be observable mid-flight.
const filler =
"Some paragraph text that pads the document body.\n\n".repeat(200);

await mkdir(srcDir, { recursive: true });
await Promise.all(
Array.from({ length: fileCount }, (_, index) =>
writeFile(
path.join(srcDir, `doc-${index}.mdx`),
`---\ntitle: "Doc ${index}"\n---\n\n# Doc ${index}\n\n${filler}\nEND-OF-DOC-${index}\n`
)
)
);

let runsSettled = false;
const runs = Promise.all(
Array.from({ length: concurrentRuns }, () =>
convertAllMdx({ srcDir, outDir })
)
).finally(() => {
runsSettled = true;
});

// Concurrent reader modeling a sibling build step (tsc, next build)
// reading the shared output directory while generation is in flight:
// every successfully read file must be complete, and a file must never
// disappear once it has been observed.
const seen = new Set<number>();
const reader = (async () => {
while (!runsSettled) {
for (let index = 0; index < fileCount; index++) {
const outputPath = path.join(outDir, `doc-${index}.md`);
try {
const content = await readFile(outputPath, "utf8");
seen.add(index);
expect(content.startsWith("---")).toBe(true);
expect(content).toContain(`END-OF-DOC-${index}`);
} catch (error) {
expect((error as NodeJS.ErrnoException).code).toBe("ENOENT");
expect(seen.has(index)).toBe(false);
}
}
}
})();

await Promise.all([runs, reader]);

// Final state: every output present and complete, no temp files leaked.
const entries = await readdir(outDir);
expect(entries.sort()).toEqual(
Array.from({ length: fileCount }, (_, index) => `doc-${index}.md`).sort()
);
for (let index = 0; index < fileCount; index++) {
const content = await readFile(
path.join(outDir, `doc-${index}.md`),
"utf8"
);
expect(content).toContain(`END-OF-DOC-${index}`);
}
});
});
7 changes: 4 additions & 3 deletions packages/leadtype/src/convert/convert.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { execFile } from "node:child_process";
import { existsSync } from "node:fs";
import { mkdir, readFile, realpath, writeFile } from "node:fs/promises";
import { mkdir, readFile, realpath } from "node:fs/promises";
import { cpus } from "node:os";
import { basename, dirname, join, relative, resolve, sep } from "node:path";
import { performance } from "node:perf_hooks";
Expand All @@ -9,6 +9,7 @@ import type { Root } from "mdast";
import { mdxToMdast } from "satteri";
import { glob as fg } from "tinyglobby";
import type { PluggableList } from "unified";
import { writeFileAtomic } from "../internal/atomic-fs";
import {
deriveDocContext,
resolvePlaceholderStrings,
Expand Down Expand Up @@ -1117,7 +1118,7 @@ async function processMdxFile(
}

await mkdir(dirname(outputPath), { recursive: true });
await writeFile(outputPath, markdown);
await writeFileAtomic(outputPath, markdown);

if (!writeToStdout) {
const ms = Date.now() - startedAt;
Expand Down Expand Up @@ -1264,7 +1265,7 @@ export async function convertAllMdx(
}
);
const outputPath = deriveOutputPath(mdxFilePath, srcDir, outDir);
await writeFile(outputPath, markdown);
await writeFileAtomic(outputPath, markdown);
logger.debug({
human: {
message: `convert ${mdxFilePath} → ${outputPath} (${Date.now() - fileStartedAt}ms)`,
Expand Down
5 changes: 3 additions & 2 deletions packages/leadtype/src/feed/index.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
import { existsSync } from "node:fs";
import { mkdir, readFile, stat, writeFile } from "node:fs/promises";
import { mkdir, readFile, stat } from "node:fs/promises";
import path from "node:path";
import { glob as fg } from "tinyglobby";
import { type DocsI18nConfig, normalizeDocsI18nConfig } from "../i18n";
import { writeFileAtomic } from "../internal/atomic-fs";
import {
type DocsPathMount,
normalizeBaseUrl,
Expand Down Expand Up @@ -429,7 +430,7 @@ export async function generateFeedArtifacts(
entries,
});
await mkdir(path.dirname(outputPath), { recursive: true });
await writeFile(outputPath, rendered);
await writeFileAtomic(outputPath, rendered);
feedFiles[format] = outputPath;
}
files[feed.id] = feedFiles;
Expand Down
Loading
Loading