veniceai · joshua-mo-143 · Jun 7, 2026 · Jun 7, 2026 · Jun 7, 2026 · Jun 8, 2026
diff --git a/README.md b/README.md
@@ -46,6 +46,24 @@ The documentation will be available at `http://localhost:3000`.
 - Place images and assets in the corresponding directories
 - Reference the OpenAPI specification in `swagger.yaml` for API details
 
+### Copy Markdown Button
+
+Use the shared snippet when you want a page-level control that copies the current page's Mintlify Markdown export:
+
+```mdx
+import { CopyMarkdownButton } from "/snippets/CopyMarkdownButton.jsx";
+
+<CopyMarkdownButton />
+```
+
+By default, the button fetches the current page URL with a `.md` extension. If a page needs to copy a different Markdown export, pass `sourcePath`:
+
+```mdx
+<CopyMarkdownButton sourcePath="/guides/overview.md" label="Copy guide Markdown" />
+```
+
+Mintlify's local preview does not serve `.md` exports, and browsers cannot copy from the deployed docs during local testing unless the deployed site allows cross-origin reads. In local preview, the button displays `Unavailable locally`; test the full copy flow on deployed docs.
+
 ## 📖 Documentation Features
 
 - 🎨 Clean, modern UI with customizable theming

diff --git a/guides/projects/private-rag-bot.mdx b/guides/projects/private-rag-bot.mdx
@@ -1,17 +1,20 @@
 ---
 title: "Building a Private RAG Bot"
 slug: private-rag-bot-venice-qdrant-reranking
 "og:title": "Building a Private RAG Bot with Venice, Qdrant, and Re-ranking"
 "og:description": "A practical guide to building a modern private RAG bot with Venice embeddings, Qdrant vector search, FastEmbed re-ranking, and Venice chat completions."
 
 ---
 import { AuthorByline } from "/snippets/authorByline.jsx";
+import { CopyMarkdownButton } from "/snippets/CopyMarkdownButton.jsx";
 
 <AuthorByline name="Joshua Mo" date="29 April 2026"/>
 
+<CopyMarkdownButton />
+
 Retrieval-augmented generation, or RAG, is one of the most useful patterns for building AI applications that need to answer from your own documents. Instead of asking a model to rely on memory alone, you retrieve relevant source material first, send that context to the model, and ask it to answer with citations.
 
 In this tutorial, we'll build a private RAG bot using Python, Venice for embeddings and chat completions, Qdrant for vector search, and FastEmbed for local re-ranking. By the end, you'll have the core pieces for a local document assistant that can ingest your files, retrieve relevant chunks, re-rank them, and answer with citations.

 ![The RAG bot in action](/images/guides/private-rag-bot/screenshot.png)

@@ -32,7 +35,7 @@
 | Load | Read local Markdown, text, or reStructuredText files |
 | Chunk | Split long documents into overlapping sections |
 | Embed | Use Venice embeddings to turn chunks into vectors |
 | Store | Save vectors and source metadata in Qdrant |
 | Retrieve | Embed the user's question and run vector search |
 | Re-rank | Use a cross-encoder to rescore the best candidates |
 | Answer | Send the best context to a Venice chat model with citation instructions |
@@ -41,7 +44,7 @@

 ## Installing the Dependencies

 We'll use the OpenAI Python SDK because Venice exposes an OpenAI-compatible API. We'll also use Qdrant's Python client with FastEmbed support:

 ```bash
 pip install "openai>=1.0.0" "qdrant-client[fastembed]>=1.14.1"
@@ -106,7 +109,7 @@
  -H "Authorization: Bearer $VENICE_API_KEY"
 ```

 ## Creating the Venice and Qdrant Clients

 Create one OpenAI-compatible Venice client for both embeddings and chat completions:

@@ -117,21 +120,21 @@
 )
 ```

 For Qdrant, you have three useful modes:

 | Mode | When to use it |
 | --- | --- |
 | `QdrantClient(":memory:")` | Quick local demos and tests |
 | `QdrantClient(path="./qdrant_data")` | Local persistent storage |
 | `QdrantClient(url=..., api_key=...)` | A remote or managed Qdrant cluster |

 For a private local bot, start with an on-disk local Qdrant path:

 ```python
 qdrant = QdrantClient(path="./qdrant_data")
 ```

 There's a few different ways to handle deployment in production. However if you use a remote Qdrant deployment, remember that your document chunks and metadata will be stored there. Venice can keep the inference layer private, but you should still choose the right Qdrant deployment for your data.

 ## Loading and Chunking Documents

@@ -219,9 +222,9 @@

 Batching matters. Embedding one chunk at a time is simple, but it adds avoidable latency. Keep the batch size configurable so you can tune throughput based on your workload.

 ## Storing Vectors in Qdrant

 Before inserting points, create a Qdrant collection with the right vector size. The easiest way to know the vector size is to embed the first batch, then use `len(embeddings[0])`.

 ```python
 qdrant.create_collection(
@@ -251,11 +254,11 @@
 qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
 ```

 Use deterministic UUIDs derived from `source`, `chunk_index`, and content. That makes repeated ingestion idempotent for unchanged chunks.

 ## Retrieving Candidate Chunks

 At question time, the bot embeds the user's question and asks Qdrant for the top vector matches:

 ```python
 query_vector = embed([question])[0]
@@ -366,7 +369,7 @@
  --question "What does this project do?"
 ```

 To keep a local Qdrant collection on disk and start an interactive chat:

 ```bash
 python rag_bot.py \
@@ -405,10 +408,10 @@
 | `--chunk-size` | `1000` | Maximum chunk size before overlap |
 | `--chunk-overlap` | `150` | Characters repeated between neighboring chunks |
 | `--embedding-batch-size` | `32` | Number of chunks per Venice embeddings request |
 | `--qdrant-path` | unset | Local persistent Qdrant storage path |
 | `--qdrant-url` | unset | Remote Qdrant URL |
 | `--skip-ingest` | `false` | Query an existing collection without reloading docs |
 | `--recreate-collection` | `false` | Delete and rebuild the Qdrant collection |

 For repeated local development, a common flow is:

@@ -437,11 +440,11 @@
 | --- | --- |
 | Venice embeddings | Document chunks are sent to Venice to create vectors |
 | Venice chat | Retrieved context is sent to Venice to answer the question |
 | Qdrant local | Vectors and payloads stay on your machine |
 | Qdrant remote | Vectors and payloads are stored wherever your Qdrant server runs |
 | FastEmbed re-ranker | Re-ranking runs locally after the model is available |

 The most private default for this tutorial is Venice for inference, local Qdrant on disk, and local FastEmbed re-ranking. That gives you a practical RAG bot without sending your vector database payloads to a third-party vector store.

 ## Common Errors to Handle Up Front

@@ -450,16 +453,16 @@
 | `Set VENICE_API_KEY before running this example.` | The environment variable is missing | Export `VENICE_API_KEY` before running the script |
 | `Document path does not exist` | A path passed to `--docs` is wrong | Check the file or folder path |
 | Empty retrieval results | Nothing was ingested, or the wrong collection is being queried | Remove `--skip-ingest` or confirm `--collection` and `--qdrant-path` |
 | Qdrant vector size error | The collection was created with a different embedding model | Recreate the collection after changing embedding models |
 | Slow first re-rank | FastEmbed may be downloading or initializing the cross-encoder | Let the first run finish, then subsequent runs should be faster |

 If you change embedding models, recreate the Qdrant collection. Different embedding models can produce vectors with different dimensions, and Qdrant collections expect a fixed vector size.

 ## Where to Go Next

 Once you have the baseline running, the highest-impact improvements are usually:

 - Add document-specific loaders for PDFs, HTML, tickets, or internal wiki pages.
 - Store richer metadata such as titles, headings, dates, owners, and URLs.
 - Tune `candidate_k`, `top_k`, chunk size, and overlap on real questions.
 - Add evaluation questions so you can measure retrieval quality before and after changes.

diff --git a/guides/projects/private-research-agent.mdx b/guides/projects/private-research-agent.mdx
@@ -6,9 +6,12 @@
 
 ---
 import { AuthorByline } from "/snippets/authorByline.jsx";
+import { CopyMarkdownButton } from "/snippets/CopyMarkdownButton.jsx";
 
 <AuthorByline name="Joshua Mo" date="7 May 2026"/>
 
+<CopyMarkdownButton />
+
 Research agents are useful when you want more than a single search result or a quick model answer. A good research agent can turn a broad topic into search queries, collect sources, extract the important evidence, follow up on gaps, and write a cited briefing that you can inspect afterward.
 
 In this tutorial, we'll build a private research agent using Python and the Venice API. By the end, you'll have a CLI that can research a topic, scrape public pages into Markdown, summarize source chunks, run gap-aware follow-up research passes, and generate a cited report with optional local JSONL artifacts.
@@ -29,10 +32,10 @@
 | --- | --- |
 | CLI | Accepts a research topic, model, providers, depth settings, output path, and artifact directory |
 | Venice client | Calls chat completions, streaming chat completions, and `POST /augment/scrape` |
 | Search layer | Searches DuckDuckGo by default, with optional arXiv paper discovery |
 | Data models | Tracks source URLs, canonical URLs, chunks, evidence, notes, errors, and reports |
 | Research agent | Plans searches, reads sources, extracts evidence, analyzes gaps, generates follow-up queries, and writes the final report |
 | Artifact writer | Stores auditable JSONL records for queries, research gaps, results, fetches, chunks, source notes, report drafts, errors, and reports |

 The flow looks like this:

@@ -40,7 +43,7 @@

 1. Ask Venice to generate diverse search queries for the topic.
 2. Search the web with one or more providers.
 3. Deduplicate URLs before reading them.
 4. Use Venice's scrape endpoint to turn each public source page into Markdown.
 5. Split long pages into chunks.
 6. Ask Venice to extract evidence from each chunk.
@@ -48,7 +51,7 @@
 8. Identify research gaps and source-balance issues before generating follow-up queries.
 9. Ask Venice to synthesize the final report with footnote-style citations.

 This is "private" in the practical sense that the agent keeps the orchestration, source notes, artifacts, and final reports on your machine. Venice handles the model calls and scraping through its API. The default reference implementation still sends search queries to DuckDuckGo or arXiv, so treat provider choice as part of your privacy design.

 ## Setting Up the Project

@@ -197,7 +200,7 @@

 `canonical_url` lets the agent avoid reading the same source repeatedly when search results differ only by tracking parameters or fragments. `content_hash` helps catch duplicate pages even when they live at different URLs. `chunks` lets us summarize long pages in smaller pieces instead of losing useful evidence to context limits.

 Add the helper functions below the dataclasses:

 ```python
 def utc_now() -> str:
@@ -440,7 +443,7 @@

 ## Adding Search Providers

 The search layer has two jobs: find source URLs and fetch those URLs through the Venice scraper. The reference implementation uses DuckDuckGo's HTML endpoint for general web search and arXiv's Atom API for papers.

 Create `research_agent/web.py`:

@@ -509,7 +512,7 @@
        return results
 ```

 And arXiv:

 ```python
 class ArxivProvider(SearchProvider):
@@ -671,7 +674,7 @@

 ## Writing Local Artifacts

 For research workflows, auditability matters. If the final report says something surprising, you should be able to inspect which source led to it.

 Create `research_agent/artifacts.py`:

@@ -751,7 +754,7 @@

 The system prompt is the core behavioral guardrail. We don't want the model to produce an impressive-sounding report from memory. We want it to use the source material and call out uncertainty when the evidence is thin.

 We also need two final dataclasses in `models.py` if you have not added them yet:

 ```python
 @dataclass(frozen=True)
@@ -865,7 +868,7 @@
        )
 ```

 The two `seen_*` sets are what keep the agent from wasting time on duplicate sources. URL dedupe catches repeated links. Content hash dedupe catches mirrors, syndicated posts, and pages that redirect to the same final content.

 ## Planning Initial and Follow-up Searches

@@ -1096,7 +1099,7 @@
                    notes.append(note)
 ```

 Individual search and fetch failures should not stop the whole run. The public web is messy. Some pages block scraping, some return PDFs, some are down, and some redirect to unexpected places. A research agent should keep moving and record what failed.

 Here is the source-reading method:

@@ -1526,7 +1529,7 @@

 Use `brief` for a concise source-backed briefing, `standard` for a fuller survey, and `deep` for the staged outline/section/editor workflow.

 Save auditable artifacts:

 ```bash
 uv run python main.py "privacy tradeoffs in hosted LLM APIs" \
@@ -1564,7 +1567,7 @@
 | Layer | What sees the data |
 | --- | --- |
 | Local CLI | Topic, configuration, source notes, artifacts, and final reports stay on your machine |
 | Search provider | Search queries are sent to the provider you choose, such as DuckDuckGo or arXiv |
 | Venice scrape | Public source URLs are sent to Venice's scrape endpoint |
 | Venice chat completions | Prompts, source chunks, source notes, and report-generation instructions are sent to Venice |
 | Output files | Markdown reports and JSONL artifacts are written locally |
@@ -1581,9 +1584,9 @@

 ## Testing the Pieces

 You do not need live web requests or Venice calls to test most of the system. The reference repo uses fake Venice and fake web classes to test the research loop, dedupe behavior, artifacts, and report prompts.

 A useful first test is URL canonicalization:

 ```python
 from research_agent.models import canonicalize_url
@@ -1633,7 +1636,7 @@

 - Add a Venice search provider using `POST /augment/search`.
 - Store reports and artifacts in a small SQLite database instead of JSONL files.
 - Add source allowlists or blocklists for trusted research domains.
 - Add PDF support by combining Venice scrape with document parsing for sources that do not expose clean HTML.
 - Add an evaluation set of topics and expected source types so you can compare research quality after prompt changes.
 - Add a review step that asks Venice to find unsupported claims in the final report before saving it.
@@ -1644,4 +1647,4 @@

 Thanks for reading! Hopefully this helped you build a practical private research agent with Python and the Venice API.

 The useful pattern here is not just "ask a model to research something." It is breaking research into auditable steps: plan searches, collect sources, extract evidence, write source notes, follow up on gaps, and synthesize with citations. By keeping those steps explicit, we get a research workflow that is easier to inspect, test, and improve over time.
diff --git a/guides/projects/security-code-reviewer.mdx b/guides/projects/security-code-reviewer.mdx
@@ -2,16 +2,19 @@
 title: "Building a Codebase Security Reviewer"
 slug: security-code-reviewer-venice
 "og:title": "Building a Codebase Security Reviewer with Venice AI"
 "og:description": "A practical guide to building a Python security agent that finds atomic vulnerabilities and chains them into exploit paths using Venice AI, an AST repo map, and Pydantic guardrails."
 
 ---
 import { AuthorByline } from "/snippets/authorByline.jsx";
+import { CopyMarkdownButton } from "/snippets/CopyMarkdownButton.jsx";
 
 <AuthorByline name="Joshua Mo" date="2 June 2026"/>
 
+<CopyMarkdownButton />
+
 Most static security tools find bugs in isolation. They scan one file, list the issues, and move on. The problem is that the most damaging vulnerabilities in modern codebases are rarely a single bug. They're a chain: a hardcoded signing key plus a missing authorization check plus a SQL injection that, on their own, all look manageable. Together they're an account-takeover path.
 
 This is exactly the kind of cross-cutting reasoning LLMs are good at, if you give them the right structure. In this article, we'll build a two-agent security code reviewer using Python and the Venice AI API. By the end, you'll have a CLI you can point at any Python codebase to produce a Markdown report with atomic findings and exploit chains.

 Interested in the full code implementation? Check out [the GitHub repo.](https://github.com/joshua-mo-143/venice-security-agent-demo)

@@ -27,35 +30,35 @@

 | Part | What it does |
 | --- | --- |
 | Pydantic models | Define `Evidence`, `Finding`, and `Chain`, and give us a hard validation boundary between the LLM and the rest of the program |
 | Venice client | Wraps the OpenAI Python SDK pointed at Venice's OpenAI-compatible endpoint |
 | AST repo map | Walks the target tree with Python's `ast` module and builds a deterministic map of every module's public symbols and import edges |
 | Scanner agent | Reads one Python file at a time plus a per-file neighbourhood slice of the repo map, and emits atomic vulnerability findings with file:line evidence |
 | Chainer agent | Reads the union of findings plus a condensed full repo map, and emits exploit chains that combine two or more findings |
 | Reference validator | Drops any chain that references a finding ID the Scanner did not produce, or names a file none of its referenced findings actually came from |
 | Markdown report | Renders findings and chains into a human-readable report |
 | CLI | Wires everything together with Typer |

 The flow looks like this:

 1. Walk the target directory for `.py` files.
 2. Build a deterministic repo map (imports, public symbols, signatures).
 3. For each file, send the Scanner its source plus a per-file neighbourhood slice of the map and collect atomic findings.
 4. Send the union of findings plus the condensed repo map to the Chainer and collect exploit chains.
 5. Drop any chain that references a finding ID the Scanner did not produce, or that names a file none of its referenced findings actually came from.
 6. Write a Markdown report.

 Two design decisions are worth flagging before we start writing code.

 The first is **why two agents instead of one**. A single-agent scanner that tries to do everything in one prompt has to balance being thorough about per-file bugs against being clever about combinatorial reasoning. Splitting the work means the Scanner can be relentless and noisy, and the Chainer can be selective and quiet. Adding one extra LLM call dedicated to combining findings unlocks an entire class of bug for very little extra code.

 The second is **why a repo map**. Real codebases live across many files. A bug that consists of "the validator runs but doesn't apply per-iteration in the fetcher, and the fetcher's response ends up in the renderer" is invisible to a per-file scanner. Before any LLM call, we walk the target tree with Python's `ast` and build a structural map. The Scanner sees a per-file *neighbourhood* (who imports from this file, what this file imports, signatures of those external symbols). The Chainer sees a *condensed* full map (every module, every public symbol, every import edge, no source). That's the smallest amount of context engineering we have found that lets the Chainer construct chains whose data flow crosses module boundaries, without paying the token cost of stuffing the whole codebase into every prompt.

 ## Pre-requisites

 - Python 3.12+
 - A Venice API key from [venice.ai](https://venice.ai)
 - Basic familiarity with Pydantic, Python's `ast` module, and the OpenAI Python SDK

 The reference repo uses [`uv`](https://docs.astral.sh/uv/) for dependency management, but a regular virtual environment works just as well.

@@ -162,12 +165,12 @@

 A few things worth noting:

 - We default to `zai-org-glm-5` because it's a strong general-purpose Venice model, but you can override it with the `VENICE_MODEL` environment variable. For larger or more nuanced codebases, swapping in a stronger model can make the Chainer notably better at narrative quality.
 - `build_client` returns the client *and* the model id, so callers don't have to read environment variables themselves and tests can inject a fake config without monkeypatching.

 ## Defining the Data Models

 The whole point of using Pydantic here, rather than passing raw dicts around, is that we get a hard validation boundary between the LLM and the rest of the program. If the model returns malformed JSON or invents a finding ID that doesn't exist, parsing fails loudly and we never propagate the hallucination into the report.

 Create `src/venice_security_reviewer/models.py`:

@@ -257,15 +260,15 @@
    return valid, dropped
 ```

 This is the deterministic guardrail that keeps the Chainer honest. It can only reference findings the Scanner actually produced, and it can only claim files involved in the chain that one of those findings actually came from. Returning the dropped chains rather than silently filtering them lets the CLI surface a warning when the model tries to invent something.

 ## Building the AST Repo Map

 The repo map is the structural skeleton of a Python codebase: every module's public surface, every import edge, and a reverse index from "module M" to "modules that import from M". It's built once per scan run with Python's `ast`, never via execution, so it's safe to run on adversarial code: the parser doesn't import or invoke anything from the scanned tree.

 We'll consume the map in two shapes. The Scanner gets a per-file *neighbourhood* slice so its prompts stay bounded in size. The Chainer gets a *condensed* full map so it can construct chains across files.

 Create `src/venice_security_reviewer/repo_map.py` and start with the Pydantic models that describe the map:

 ```python
 from __future__ import annotations
@@ -441,7 +444,7 @@

 `neighborhood(path)` is what the Scanner calls for each file. It returns a `ModuleNeighborhood` object containing the module itself, every other module that imports from it, and every in-repo symbol it imports from elsewhere (with their resolved signatures). That gives the Scanner enough context to flag findings that are only obvious in cross-file context, without dragging the whole codebase into the prompt.

 `condensed_dict()` is what the Chainer gets. Snippets and signatures are dropped; only paths, module names, public exports, and import edges remain. That's the smallest representation that still lets the Chainer reason about cross-module data flow.

 Finally, the entry point that builds the whole thing:

@@ -526,12 +529,12 @@
 9. If the file contains no vulnerabilities, return `{"findings": []}`.
 ````

 The full prompt in the [reference repo](https://github.com/joshua-mo-143/venice-security-agent-demo/blob/main/prompts/scanner.md) also contains a "What to look for" section listing common vulnerability classes (hardcoded secrets, SQL injection, command injection, SSRF, insecure deserialization, etc.) and a "How to use the neighborhood" section explaining how the model should consume the cross-file context.

 A few prompt design notes:

 - We tell the model to emit JSON only, with no prose or fences. The OpenAI SDK supports a `response_format={"type": "json_object"}` parameter that enforces this on the API side, but reinforcing it in the prompt cuts down on edge cases.
 - We explicitly forbid the Scanner from producing cross-file chains. Chains are the Chainer's job, and asking the Scanner to do both blurs the responsibility.
 - We require the snippet to be copied verbatim. This means the report can quote the exact bytes the model claims to have seen, and a reviewer can spot-check a finding by comparing the snippet to the source.

 Now the agent code. Create `src/venice_security_reviewer/scanner.py` and start with the file walker and prompt loader:
@@ -609,7 +612,7 @@
    )
 ```

 Now the parser. We deserialise the JSON, validate each finding through Pydantic, and drop individual malformed findings rather than failing the whole file. One bad finding shouldn't lose us the good ones:

 ```python
 def _parse_findings(raw: str, *, source_file: Path) -> list[Finding]:
@@ -630,7 +633,7 @@
    return findings
 ```

 The Scanner emits IDs like `F-001` per file, but the Chainer needs to reference findings across the whole repo. We re-issue the IDs against a monotonic counter so they're globally unique:

 ```python
 def _renumber_findings(findings: list[Finding], offset: int) -> tuple[list[Finding], int]:
@@ -709,7 +712,7 @@
 Two details worth highlighting:

 - We patch the evidence file path to be relative to `repo_root` *after* parsing, since the model echoes back whatever filename we gave it but we want a single canonical form throughout the report.
 - `temperature=0.1` is intentionally low. We want the Scanner to be conservative and consistent across runs; creativity is the Chainer's job.

 Finally, the orchestrator that scans every eligible file under the root:

@@ -740,16 +743,16 @@

 The repo map gets built once by the caller and reused for every file, so the Scanner sees a consistent global structure even when individual files fail to parse or get skipped.

 ## Writing the Chainer Agent

 The Chainer takes the union of Scanner findings plus the condensed repo map and asks Venice whether any of the findings combine into a real exploit chain. Two deterministic guardrails sit between the LLM and the report:

 1. Every chain must reference only finding IDs the Scanner produced.
 2. Every chain must claim only files that at least one referenced finding's evidence touches.

 Chains that violate either rule get dropped at parse time. This stops the model from hallucinating chains "just in case" and from claiming a chain spans files it has no evidence for.

 The Chainer prompt lives at `prompts/chainer.md`. The core of it looks like this:

 ````markdown
 You are a senior offensive security engineer. You are given a list of atomic
@@ -812,9 +815,9 @@
    return (here.parents[2] / "prompts" / name).read_text(encoding="utf-8")
 ```

 `MAX_REPO_MAP_CHARS = 8000` is a soft ceiling for the JSON-rendered repo map block in the Chainer prompt. At roughly 4 chars per token, that's ~2000 tokens, which sits comfortably inside any Venice model's context window even with findings and the narrative budget on top.

 We serialise findings into a compact JSON block. Note we strip the `snippet` from evidence here on purpose: the Chainer doesn't need raw bytes to decide whether two findings combine, and including them roughly doubles the token cost on real codebases:

 ```python
 def _findings_to_input_json(findings: list[Finding]) -> str:
@@ -836,7 +839,7 @@
    return json.dumps(payload, indent=2)
 ```

 For larger codebases the full condensed repo map can blow past our character budget. When that happens, we prune to finding-bearing modules plus their direct neighbours. That preserves enough structure for the Chainer to reason about chains we have evidence for, and discards the rest:

 ```python
 def _prune_for_budget(
@@ -897,9 +900,9 @@
    return json.dumps(payload, indent=2)
 ```

 The pruning strategy is intentionally simple: keep the modules our findings live in, and keep their direct import-graph neighbours. Anything further out has no plausible role in a chain we currently have evidence for, so it can be dropped without losing reasoning power. We also annotate the payload with `_pruned`, `_kept`, and `_total` markers, so the Chainer prompt can warn the model when the map has been trimmed.

 Parsing the response is the same shape as the Scanner: deserialise, validate each chain through Pydantic, drop malformed entries:

 ```python
 def _parse_chains(raw: str) -> list[Chain]:
@@ -986,14 +989,14 @@
 A couple of things worth pointing out:

 - We bail out before calling the model when there are fewer than two findings. You can't chain a single finding, and skipping the call means we don't burn tokens on a guaranteed-empty result.
 - `temperature=0.2` is slightly higher than the Scanner's `0.1`. The Chainer benefits from a touch more creativity to spot non-obvious combinations, but we still want it grounded in the findings and map it was given.
 - After parsing, `validate_chain_references` runs the deterministic cross-reference check we wrote earlier. Anything that survives is safe to render; anything that doesn't gets logged so the operator knows the model tried to invent something.

 That cross-reference check is the most important piece of the whole project. It's the boundary between "useful security tool" and "occasionally confidently wrong AI report." With it in place, even if the model hallucinates, the wrong chain never reaches the report.

 ## Rendering the Markdown Report

 Keeping rendering separate from agent logic means the same `Finding` and `Chain` objects can later be fed into other formats (JSON, SARIF, HTML) without touching the Scanner or Chainer.

 We'll use Jinja2 with a small template file. Create `src/venice_security_reviewer/templates/report.md.j2`:

@@ -1095,11 +1098,11 @@
    )
 ```

 Autoescape stays off for the Markdown template (Markdown isn't HTML), but we leave it enabled for any future `.html` templates by extension.

 ## Wiring the CLI

 The CLI is the orchestrator: build the repo map, scan, chain, render. We'll use Typer to handle argument parsing and Rich to print a nice summary table.

 Create `src/venice_security_reviewer/cli.py`:

@@ -1225,7 +1228,7 @@

 ## Testing the Guardrails

 We've leaned hard on one idea throughout this build: the deterministic guardrails are what separate a useful security tool from a confidently wrong one. That claim is only worth making if we can prove the guardrails actually hold, so the most valuable tests in this project don't call Venice at all. They lock down the Pydantic boundary and the prompt-assembly plumbing, which means they run offline, in milliseconds, with no API key and no token cost.

 Add the dev dependencies first:

@@ -1290,7 +1293,7 @@

 Each of these mirrors a constraint we put on the models earlier: an inverted line range, an ID that doesn't match the `F-###` pattern, and a "chain" of a single finding. If any of them ever stops raising, a whole class of hallucination has quietly become possible again.

 The most important test covers the cross-reference validator, since that's the function that actually drops invented chains:

 ```python
 def test_validate_chain_references_drops_unknown_ids() -> None:
@@ -1316,7 +1319,7 @@

 `F-999` was never produced by the Scanner, so the chain that references it lands in `dropped` and never reaches the report. The companion test in the reference repo, `test_validate_chain_references_drops_unknown_files`, does the same for a chain that claims a file none of its findings came from.

 The second thing worth testing is the plumbing that feeds the Chainer. It's easy to refactor the prompt assembly and silently stop passing cross-file context, at which point the Chainer would keep working but quietly get worse. This test builds a two-module fixture, renders the prompt, and asserts the cross-file information is actually present, again without a Venice round-trip. Create `tests/test_cross_file_chain.py`:

 ```python
 from __future__ import annotations
@@ -1376,7 +1379,7 @@
    assert "{findings_json}" not in prompt and "{repo_map}" not in prompt
    assert "F-001" in prompt and "F-002" in prompt
    assert "validators.py" in prompt and "fetcher.py" in prompt
    assert "is_safe_url" in prompt
 ```

 If this test passes, the Chainer is being handed a prompt that contains both findings, both file paths, and the import edge between them. Whether the *model* uses that information well is a separate, out-of-band evaluation; this test only guards the plumbing that gets the information into the prompt in the first place.
@@ -1399,7 +1402,7 @@
 uv run venice-security-reviewer scan path/to/your/code
 ```

 Or install it into your virtualenv with `pip install -e .` and run `venice-security-reviewer scan path/to/your/code`.

 The output looks roughly like this:

@@ -1422,10 +1425,10 @@

 The Markdown report shows each chain at the top with its narrative, then every individual finding underneath with severity, CWE, file location, description, and the verbatim snippet the model claims to have read.

 The reference repo also ships with four bundled demo targets that each exercise a different shape of reasoning the Chainer has to do:

 - `examples/vulnerable_app` — a multi-file Flask app with three "low" findings, two of which combine into a critical privilege-escalation chain across files. Tests whether the Chainer is selective about what it combines.
 - `examples/url_preview` — a multi-file URL-fetcher with a defensive allowlist that doesn't apply per-iteration. Tests cross-file data-flow reasoning combined with deployment topology (link-local IPs are cloud-credential gateways).
 - `examples/csv_query` — a single-file CSV filter with an `eval` sandbox escape via `__class__.__base__.__subclasses__()`. Tests language-level reasoning rather than HTTP flow.
 - `examples/webhook_handler` — a single-file HMAC verifier with a JSON parser-differential vulnerability. Tests reasoning across multiple specifications.

@@ -1436,18 +1439,18 @@
 uv run venice-security-reviewer scan examples/csv_query
 ```

 If you ever see the CLI log `chainer referenced N unknown finding id(s) or file(s); chains dropped`, that's the cross-reference validator catching the model in the act of inventing a chain. The dropped chains never make it into the report; you just get a warning that you can use to adjust the prompt or sample additional Chainer runs.

 ## Extending This Example

 The two-agent shape generalises well. A few directions worth exploring:

 - **More languages.** The Scanner is language-agnostic at the prompt level; the AST builder is what's Python-specific. Swap in `tree-sitter` and you can build the same neighbourhood/condensed-map shapes for TypeScript, Go, or Rust.
 - **A third agent for fixes.** Once you have a chain, asking a Patcher agent to draft a unified diff that defangs *one* of the constituent findings is a small step. Pydantic-validate the diff against the same evidence-file set and you get the same hallucination guard for free.
 - **Output formats.** `render_report` is the only place that knows about Markdown. Add a SARIF renderer and the same findings can drop into GitHub code scanning. Add a JSON renderer and you can pipe results into a downstream system.
 - **Caching by file hash.** The Scanner's per-file calls are independent and idempotent. Caching by `(file_hash, prompt_hash, model)` means re-scanning a repo where one file changed only re-runs the Scanner on that one file.
 - **Sampling for the Chainer.** For high-stakes runs, call the Chainer N times at slightly higher temperature and intersect the results. Chains the model finds consistently are more likely to be real; chains it finds once and never again are likely noise.
 - **Stronger models.** `zai-org-glm-5` is the default because it strikes a good balance between cost and quality for combinatorial reasoning, but for harder codebases swapping in a stronger Venice model (set via `VENICE_MODEL`) can make the Chainer's narratives noticeably tighter.

 ## Finishing Up


diff --git a/snippets/CopyMarkdownButton.jsx b/snippets/CopyMarkdownButton.jsx
@@ -0,0 +1,131 @@
+export const CopyMarkdownButton = (props = {}) => {
+  const {
+    className = "",
+    label = "Copy article as Markdown",
+    sourcePath,
+  } = props;
+  const statusText = {
+    loading: "Copying...",
+    copied: "Copied!",
+    error: "Could not copy",
+    localUnavailable: "Unavailable locally",
+  };
+
+  const getMarkdownUrl = () => {
+    const baseUrl = new URL(window.location.href);
+
+    if (sourcePath) {
+      return new URL(sourcePath, baseUrl).toString();
+    }
+
+    const pathname = baseUrl.pathname.replace(/\/$/, "") || "/";
+    const markdownPath =
+      pathname === "/"
+        ? "/index.md"
+        : `${pathname.replace(/\.(html|mdx?|md)$/i, "")}.md`;
+
+    return new URL(markdownPath, baseUrl.origin).toString();
+  };
+
+  const isLocalPreview = () => {
+    const { hostname } = window.location;
+
+    return (
+      hostname === "localhost" ||
+      hostname === "0.0.0.0" ||
+      hostname === "[::1]" ||
+      hostname.startsWith("127.")
+    );
+  };
+
+  const fetchMarkdown = async (url, credentials = "same-origin") => {
+    const response = await fetch(url, {
+      headers: {
+        Accept: "text/markdown, text/plain;q=0.9, */*;q=0.1",
+      },
+      credentials,
+    });
+
+    if (!response.ok) {
+      throw new Error(`Markdown request failed: ${response.status}`);
+    }
+
+    return response.text();
+  };
+
+  const copyToClipboard = async (text) => {
+    if (navigator.clipboard?.writeText && window.isSecureContext) {
+      await navigator.clipboard.writeText(text);
+      return;
+    }
+
+    const textarea = document.createElement("textarea");
+    textarea.value = text;
+    textarea.setAttribute("readonly", "");
+    textarea.style.position = "fixed";
+    textarea.style.top = "-9999px";
+    textarea.style.left = "-9999px";
+    document.body.appendChild(textarea);
+    textarea.select();
+
+    try {
+      document.execCommand("copy");
+    } finally {
+      document.body.removeChild(textarea);
+    }
+  };
+
+  const updateButton = (button, text, disabled = false) => {
+    const labelNode = button.querySelector(
+      ".venice-copy-markdown-button-label",
+    );
+
+    if (labelNode) {
+      labelNode.textContent = text;
+    }
+
+    button.disabled = disabled;
+  };
+
+  const handleCopy = async (event) => {
+    const button = event.currentTarget;
+    updateButton(button, statusText.loading, true);
+
+    try {
+      if (isLocalPreview()) {
+        throw new Error("Mintlify local preview does not serve Markdown exports");
+      }
+
+      const markdown = await fetchMarkdown(getMarkdownUrl());
+      await copyToClipboard(markdown);
+      updateButton(button, statusText.copied);
+      window.setTimeout(() => updateButton(button, label), 2000);
+    } catch (error) {
+      console.error("Failed to copy page Markdown:", error);
+      updateButton(
+        button,
+        isLocalPreview() ? statusText.localUnavailable : statusText.error,
+      );
+      window.setTimeout(() => updateButton(button, label), 2500);
+    }
+  };
+
+  return (
+    <button
+      type="button"
+      className={`venice-copy-markdown-button ${className}`.trim()}
+      onClick={handleCopy}
+      aria-live="polite"
+    >
+      <svg
+        className="venice-copy-markdown-button-icon"
+        viewBox="0 0 24 24"
+        aria-hidden="true"
+      >
+        <rect x="9" y="9" width="13" height="13" rx="2" />
+        <path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1" />
+      </svg>
+      <span className="venice-copy-markdown-button-label">{label}</span>
+    </button>
+  );
+};
diff --git a/style.css b/style.css
@@ -2430,3 +2430,62 @@
   }
 
 }
+
+.venice-copy-markdown-button {
+  display: inline-flex;
+  align-items: center;
+  gap: 8px;
+  margin: -0.75rem 0 1.5rem;
+  padding: 8px 12px;
+  border: 1px solid rgba(82, 82, 91, 0.22);
+  border-radius: 999px;
+  background: rgba(82, 82, 91, 0.06);
+  color: #52525b;
+  font-size: 13px;
+  font-weight: 650;
+  line-height: 1;
+  cursor: pointer;
+  transition:
+    background 150ms ease,
+    border-color 150ms ease,
+    color 150ms ease,
+    opacity 150ms ease;
+}
+
+.venice-copy-markdown-button:hover:not(:disabled) {
+  border-color: rgba(82, 82, 91, 0.36);
+  background: rgba(82, 82, 91, 0.1);
+}
+
+.venice-copy-markdown-button:disabled {
+  cursor: wait;
+  opacity: 0.72;
+}
+
+.venice-copy-markdown-button:focus-visible {
+  outline: 2px solid rgba(82, 82, 91, 0.35);
+  outline-offset: 2px;
+}
+
+.venice-copy-markdown-button-icon {
+  width: 15px;
+  height: 15px;
+  fill: none;
+  stroke: currentColor;
+  stroke-width: 2;
+  stroke-linecap: round;
+  stroke-linejoin: round;
+}
+
+.dark .venice-copy-markdown-button,
+[data-theme="dark"] .venice-copy-markdown-button {
+  border-color: rgba(161, 161, 170, 0.24);
+  background: rgba(161, 161, 170, 0.08);
+  color: #a1a1aa;
+}
+
+.dark .venice-copy-markdown-button:hover:not(:disabled),
+[data-theme="dark"] .venice-copy-markdown-button:hover:not(:disabled) {
+  border-color: rgba(161, 161, 170, 0.4);
+  background: rgba(161, 161, 170, 0.14);
+}