NVIDIA · edknv · May 27, 2026 · May 28, 2026 · May 28, 2026 · May 28, 2026
@@ -9,7 +9,7 @@ This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the s
 - Skill: `nemo-retriever`
 - Evaluation date: 2026-05-29
 - NVSkills-Eval profile: `external`
-- Overall verdict: FAIL
+- Overall verdict: PASS
 - Tier 3 live agent evaluation: not available in this report
 
 ## Agents Used
@@ -40,7 +40,7 @@ Tier 3 dimension rollup was not available in this report.
 
 ## Tier 1: Static Validation Summary
 
-Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 20 total findings.
+Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 21 total findings.
 
 Top findings:
 
@@ -52,14 +52,13 @@ Top findings:
 
 ## Tier 2: Deduplication Summary
 
-Tier 2 validation reported findings. NVSkills-Eval ran 2 checks and found 1 total findings.
+Tier 2 validation passed. NVSkills-Eval ran 2 checks and found 0 total findings.
 
-Top findings:
+Notable observations:
 
-- HIGH DUPLICATE/duplicate: Duplicate content found across references/cli/query.md and references/pitfalls.md:
-  "## Common failure modes" in references/cli/query.md (lines 78-90)
-  vs "## Failure modes (expected, not errors)" in references/pitfalls.md (lines 18-26) (`references/cli/query.md:78`)
+- Context Deduplication: Collected 9 file(s)
+- Inter-Skill Deduplication: Parsed skill 'nemo-retriever': 276 char description
 
 ## Publication Recommendation
 
-The skill should be reviewed before NVSkills-Eval publication. Skill owners should address the findings above and rerun NVSkills-Eval to refresh this benchmark.
+The skill is suitable to proceed toward NVSkills-Eval publication based on this benchmark. Skill owners should keep this file with the skill and refresh it when the evaluation dataset, skill behavior, or target agents materially change.
@@ -26,16 +26,16 @@ If `command -v retriever` returns nothing, follow `references/install.md` to ins
 | Turn type | Read this once | Then execute |
 | :--- | :--- | :--- |
 | **Setup turn** (first turn — `./lancedb/nv-ingest.lance` doesn't exist) | `references/setup.md` | Build the index |
-| **Query turn** (every subsequent turn — user asks a question) | `references/query.md` | One `retriever query` call, then `Write` `./output.json` |
+| **Query turn** (every subsequent turn — user asks a question) | `references/query.md` | One `retriever query` call, then synthesize the answer |
 | Anything errored or returned empty | `references/pitfalls.md` | Apply the named recovery; do not improvise |
 
 For the full `retriever ingest` / `retriever query` CLI specs, see `references/cli/ingest.md` and `references/cli/query.md`. You do not need these for routine turns — `<RETRIEVER_VENV>/bin/retriever <subcommand> --help` is faster.
 
 ## Hard limits (apply to every turn)
 
 - **Setup turn**: build the index in one shell command (see `references/setup.md`). STOP after the index lands.
-- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. Then `Write` `./output.json` and STOP.
-- **No narration between tool calls.** Tokens you emit between calls become input + cached input for every later turn — quadratic cost. Go straight from reading the summary to writing the JSON file.
+- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. Then STOP.
+- **No narration between tool calls.** Tokens you emit between calls become input + cached input for every later turn — quadratic cost. Go straight from reading the summary to producing the answer.
 - **Banned**: `TodoWrite`, Glob, Grep, `Read` of whole PDFs, re-running setup, spawning subagents, speculative "confirmation" calls.
 
 Long query turns (5+ tool calls, 1M+ cache-read tokens) cost ~5× a disciplined turn and almost always still produce the wrong answer. **Answering partially beats timing out.**
@@ -17,12 +17,11 @@ For an unlisted subcommand: `<RETRIEVER_VENV>/bin/retriever <subcommand> --help`
 
 ## Failure modes (expected, not errors)
 
-- **First `ingest` takes ~60s+** — vLLM warmup. Expected.
-- **First `query` takes ~10–15s** — embedder cold-start. Expected.
-- **Empty result** — ingest didn't run. Use the fallback above.
+- **Empty result** — ingest didn't run. **Use the fallback above** (don't re-ingest).
 - **`Clamping num_partitions ...`** — informational on tiny corpora, not an error.
-- **Low-relevance top hit on tiny corpus** — look at `_distance` *gaps* between hits, not absolute values.
-- **Page-element-detection warnings during ingest** — non-fatal as long as the embedding step itself succeeds (and they're silenced by `--quiet` on a successful run).
+- **Page-element-detection warnings during ingest** — non-fatal; silenced by `--quiet`.
+
+For cold-start latencies, `Table not found` errors, and low-relevance diagnostics see `references/cli/query.md` and `references/cli/ingest.md`.
 
 ## You ran more than 2 Bash calls on a query turn
 

@@ -2,7 +2,7 @@
 
 ## Filename fast path — try BEFORE `retriever query`
 
-If the user's question literally contains a PDF basename from `./pdfs/` (stem ≥6 chars, with or without `.pdf`, case-insensitive), skip semantic search. Direct pdfium extraction on the named file is faster and avoids semantic-search misses — the right doc is given, and pages rank by query-token overlap.
+If the user's question literally contains a PDF basename from `./pdfs/` **including the `.pdf` extension** (stem ≥6 chars, case-insensitive), skip semantic search. Direct pdfium extraction on the named file is faster and avoids semantic-search misses — the right doc is given, and pages rank by query-token overlap.
 
 ```bash
 <RETRIEVER_VENV>/bin/python <skill_dir>/scripts/filename_fast_path.py "<the user's question>"

@@ -1,8 +1,8 @@
 """Query-turn filename fast path for the nemo-retriever skill.
 
 Reads `./pdfs/` from the current working directory. If the query string
-literally contains any PDF basename (with or without the `.pdf` extension,
-stem ≥6 chars, case-insensitive), runs `retriever pdf stage page-elements`
+literally contains any PDF basename **including the `.pdf` extension**
+(stem ≥6 chars, case-insensitive), runs `retriever pdf stage page-elements`
 on each matched file via pdfium, ranks pages by query-token frequency,
 and emits a top-10 ranking + the top page's raw text.
 
@@ -21,8 +21,10 @@
                                     pages, up to 10), followed by the top-
                                     ranked page's raw text (first 4000 chars).
 
-Exit code is 0 in all three success outcomes; non-zero only on hard errors
-(missing ./pdfs, page-elements subprocess failure, malformed sidecar JSON).
+Exit code is 0 in all three success outcomes; non-zero only when `./pdfs/` is
+missing or unreadable. Per-file errors (extraction subprocess failure, malformed
+sidecar JSON) log a warning to stderr and are skipped — if every match is bad,
+the script falls through to `NO_TEXT`.
 """
 
 from __future__ import annotations
@@ -47,43 +49,51 @@
 
 
 def find_matches(query_lower: str, basenames: list[str]) -> list[str]:
-    """Return PDF basenames whose name (with or without .pdf) appears verbatim
-    in the lowercased query. Skip stems shorter than MIN_STEM_LEN."""
+    """Return PDF basenames whose full name (including the `.pdf` extension)
+    appears verbatim in the lowercased query. Skip stems shorter than MIN_STEM_LEN.
+    Requiring the extension avoids false positives on common English words that
+    happen to appear as PDF stems (e.g. `report.pdf`, `market.pdf`)."""
     matches = []
     for name in basenames:
         stem, ext = os.path.splitext(name)
         if ext.lower() != ".pdf" or len(stem) < MIN_STEM_LEN:
             continue
-        if name.lower() in query_lower or stem.lower() in query_lower:
+        if name.lower() in query_lower:
             matches.append(name)
     return matches
 
 
 def extract_pages(retriever_bin: str, matches: list[str]) -> None:
+    """Extract each matched PDF; log per-file failures and continue so a single
+    bad PDF doesn't block remaining matches."""
     os.makedirs(EXTRACT_OUT, exist_ok=True)
     for m in matches:
-        subprocess.run(
-            [
-                retriever_bin,
-                "pdf",
-                "stage",
-                "page-elements",
-                f"{PDF_DIR}/{m}",
-                "--method",
-                "pdfium",
-                "--json-output-dir",
-                EXTRACT_OUT,
-                "--compact-json",
-            ],
-            check=True,
-        )
+        try:
+            subprocess.run(
+                [
+                    retriever_bin,
+                    "pdf",
+                    "stage",
+                    "page-elements",
+                    f"{PDF_DIR}/{m}",
+                    "--method",
+                    "pdfium",
+                    "--json-output-dir",
+                    EXTRACT_OUT,
+                    "--compact-json",
+                ],
+                check=True,
+                stdout=subprocess.DEVNULL,
+            )
+        except subprocess.CalledProcessError as exc:
+            print(f"WARN: page-elements failed on {m}: exit {exc.returncode}", file=sys.stderr)
 
 
 def sidecar_path(pdf_name: str) -> str | None:
     stem = os.path.splitext(pdf_name)[0]
     candidates = (
         f"{EXTRACT_OUT}/{pdf_name}.pdf_extraction.json",
-        f"{EXTRACT_OUT}/{stem}.pdf.pdf_extraction.json",
+        f"{EXTRACT_OUT}/{stem}.pdf_extraction.json",
     )
     for c in candidates:
         if os.path.exists(c):
@@ -92,7 +102,12 @@ def sidecar_path(pdf_name: str) -> str | None:
 
 
 def page_records(sidecar: str) -> list[dict]:
-    data = json.load(open(sidecar))
+    try:
+        with open(sidecar) as fh:
+            data = json.load(fh)
+    except json.JSONDecodeError as exc:
+        print(f"ERROR: malformed JSON in sidecar {sidecar!r}: {exc}", file=sys.stderr)
+        return []
     if isinstance(data, list):
         return data
     if isinstance(data, dict):
@@ -138,7 +153,11 @@ def main() -> int:
     ql = query.lower()
     retriever_bin = os.path.join(os.path.dirname(sys.executable), "retriever")
 
-    basenames = sorted(p for p in os.listdir(PDF_DIR) if p.lower().endswith(".pdf"))
+    try:
+        basenames = sorted(p for p in os.listdir(PDF_DIR) if p.lower().endswith(".pdf"))
+    except (FileNotFoundError, PermissionError) as exc:
+        print(f"ERROR: cannot list {PDF_DIR}: {exc}", file=sys.stderr)
+        return 1
     matches = find_matches(ql, basenames)
     if not matches:
         print("NO_MATCH")

@@ -9,7 +9,7 @@ NVIDIA <br>
 ### License/Terms of Use: <br>
 Apache 2.0 <br>
 ## Use Case: <br>
-Developers and engineers who need to search, index, or answer questions across PDF and document collections using RAG and vector search via the retriever CLI. <br>
+Developers and engineers who need to search, index, or answer questions over collections of PDFs and documents using a local RAG/vector-search pipeline powered by the retriever CLI. <br>
 
 ### Deployment Geography for Use: <br>
 Global <br>
@@ -20,22 +20,22 @@ Mitigation: Review and scan skill before deployment. <br>
 
 ## Reference(s): <br>
 - [NeMo Retriever Library Documentation](https://docs.nvidia.com/nemo/retriever/latest/extraction/overview/) <br>
-- [Install Guide](references/install.md) <br>
-- [Setup Guide](references/setup.md) <br>
-- [Query Workflow](references/query.md) <br>
-- [Pitfalls and Recovery](references/pitfalls.md) <br>
-- [CLI: ingest](references/cli/ingest.md) <br>
-- [CLI: query](references/cli/query.md) <br>
+- [CLI reference: retriever ingest](references/cli/ingest.md) <br>
+- [CLI reference: retriever query](references/cli/query.md) <br>
+- [Installation guide](references/install.md) <br>
+- [Query workflow](references/query.md) <br>
+- [Setup guide](references/setup.md) <br>
+- [Pitfalls and recovery](references/pitfalls.md) <br>
 
 
 ## Skill Output: <br>
-**Output Type(s):** [Shell commands, JSON] <br>
-**Output Format:** [JSON] <br>
+**Output Type(s):** [Shell commands, JSON, Synthesized answers] <br>
+**Output Format:** [Markdown with inline bash code blocks and JSON query results] <br>
 **Output Parameters:** [1D] <br>
-**Other Properties Related to Output:** [None] <br>
+**Other Properties Related to Output:** [Query results are JSON arrays sorted by vector distance; final answers are synthesized from retrieved context] <br>
 
 ## Evaluation Tasks: <br>
-NVSkills-Eval 3-Tier evaluation (external profile); Tier 1 static validation (9 checks, 20 findings), Tier 2 deduplication (2 checks, 1 finding). Tier 3 live agent evaluation not available in this report. <br>
+Evaluated through NVSkills-Eval 3-Tier framework (profile: external). Tier 1: 9 static validation checks (21 findings, passed with observations). Tier 2: 2 deduplication checks (0 findings, passed). Overall verdict: PASS. <br>
 
 ## Evaluation Metrics Used: <br>
 Reported benchmark dimensions: <br>
@@ -48,7 +48,7 @@ Reported benchmark dimensions: <br>
 
 
 ## Skill Version(s): <br>
-3fa00d94 (source: git SHA, committed 2026-05-28) <br>
+25.3.0-1014-gb7fdbb45 (source: git describe) <br>
 
 ## Ethical Considerations: <br>
 NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal team to ensure this skill meets requirements for the relevant industry and use case and addresses unforeseen product misuse. <br>