NVIDIA · mahikaw · May 11, 2026 · May 12, 2026 · Jun 9, 2026 · Jun 9, 2026
@@ -259,6 +259,85 @@ hits = retriever.query(query)
 {'text': '| Table | 1 |\n| This | table | describes | some | animals, | and | some | activities | they | might | be | doing | in | specific |\n| locations. |\n| Animal | Activity | Place |\n| Giraffe | Driving | a | car | At | the | beach |\n| Lion | Putting | on | sunscreen | At | the | park |\n| Cat | Jumping | onto | a | laptop | In | a | home | office |\n| Dog | Chasing | a | squirrel | In | the | front | yard |\n| Chart | 1 |', 'metadata': '{"page_number": 1, "pdf_page": "multimodal_test_1", "page_elements_v3_num_detections": 9, "page_elements_v3_counts_by_label": {"table": 1, "chart": 1, "title": 3, "text": 4}, "ocr_table_detections": 1, "ocr_chart_detections": 1, "ocr_infographic_detections": 0}', 'source': '{"source_id": "/home/dev/projects/NeMo-Retriever/data/multimodal_test.pdf"}', 'page_number': 1, '_distance': 1.614684820175171}
 ```
 
+### Agentic retrieval evaluation
+
+Agentic retrieval swaps the single dense-retrieval pass for an LLM-driven ReAct
+loop: the agent issues several retrieval sub-queries, fuses the candidates with
+reciprocal rank fusion, and selects a final ranking. You evaluate it the same way
+you evaluate standard retrieval — by scoring the ranked results against ground
+truth.
+
+`retriever pipeline run` ingests your corpus exactly as in
+[Ingest a test corpus (CLI)](#ingest-a-test-corpus-cli), then scores agentic
+retrieval against that ground truth. Add `--retrieval-mode agentic` and name the
+chat model the agent drives with `--agentic-llm-model`. The simplest form scores
+against your own query CSV (columns `query` and `golden_answer`), so no dataset
+loader is needed:
+
+```bash
+retriever pipeline run ./data \
+  --vdb-op lancedb \
+  --vdb-kwargs-json '{"uri":"lancedb","table_name":"nemo-retriever"}' \
+  --evaluation-mode recall \
+  --retrieval-mode agentic \
+  --query-csv ./queries.csv \
+  --recall-match-mode pdf_page \
+  --agentic-llm-model nvidia/llama-3.3-nemotron-super-49b-v1.5
+```
+
+`--recall-match-mode` is `pdf_page` or `pdf_only`, depending on whether
+`golden_answer` names a page or a whole document.
+
+#### Optional extras
+
+- **Remote inference (no local GPU)** — drive the agent and embedder through NIM
+  endpoints instead of local models:
+  ```bash
+  --agentic-invoke-url http://<llm-endpoint>/v1/chat/completions \
+  --embed-invoke-url http://<embed-endpoint>/v1
+  ```
+- **BEIR-style datasets** — score against a registered benchmark instead of a
+  query CSV. HuggingFace-hosted sets (the `vidore_hf` loader) need the `datasets`
+  package, which the `benchmarks` extra provides:
+  ```bash
+  uv pip install "nemo-retriever[local,benchmarks]==26.05-RC1"
+  ```
+  ```bash
+  --evaluation-mode beir \
+  --beir-loader vidore_hf \
+  --beir-dataset-name <dataset-name> \
+  --beir-doc-id-field pdf_basename
+  ```
+  Built-in loaders include `vidore_hf` (HuggingFace download) and
+  `financebench_json`. `--beir-split` and `--beir-query-language` select the
+  split and language.
+- **Image + text corpora** — for page-image benchmarks, ingest rendered pages so
+  the agent retrieves over page images, matching the
+  [ViDoRe Harness Sweep](#vidore-harness-sweep):
+  ```bash
+  --embed-model-name nvidia/llama-nemotron-embed-vl-1b-v2 \
+  --embed-modality text_image \
+  --embed-granularity page \
+  --extract-page-as-image \
+  --extract-infographics
+  ```
+- **Tune the agent** — each flag controls a different stage of the loop:
+  - `--agentic-react-max-steps` (default `50`) — how many think → retrieve rounds
+    the agent may take per query before it has to answer.
+  - `--agentic-backend-top-k` (default `20`) — how many candidates each retrieval
+    call pulls from the vector DB (the pool the agent reasons over and fuses).
+  - `--agentic-text-truncation` (default `0`) — max characters of each candidate's
+    text shown to the agent; `0` sends the full text.
+  - `--agentic-reasoning-effort` (default `high`) — the OpenAI-compatible
+    reasoning depth (`low`/`medium`/`high`) requested per LLM call.
+  - `--agentic-num-concurrent` (default `1`) — how many queries are evaluated in
+    parallel, bounded by the LLM endpoint's throughput.
+- **Logging** — per-query agent progress is logged at `INFO` by default (`--quiet`
+  suppresses it, `--debug` adds detail). There is no default log file — output
+  goes to the console; pass `--log-file ./run.log` to also write it to a file.
+  Pass `--runtime-metrics-dir ./out` to write a JSON summary of the metrics and
+  timing alongside the run.
+
 ###  Generate a query answer using an LLM
 The above retrieval results are often feedable directly to an LLM for answer generation.
 

@@ -0,0 +1,140 @@
+# Agentic Retrieval Mode
+
+Agentic retrieval mode is a retrieval strategy for the main NeMo Retriever
+pipeline. It is not a separate evaluation benchmark. The evaluation mode still
+answers "how do we score results?", while retrieval mode answers "how do we
+produce ranked results?".
+
+The first integration supports:
+
+```bash
+--evaluation-mode recall --retrieval-mode agentic
+```
+
+In this mode, the pipeline ingests documents and uploads them to the configured
+vector database exactly as it does today. The difference starts at evaluation
+time: instead of one standard dense retrieval pass, an LLM-driven graph
+retrieval pipeline searches the same vector database and produces ranked
+results that are scored with recall-style metrics.
+
+## Graph Pipeline
+
+The agentic retriever composes the existing graph operators:
+
+```mermaid
+flowchart LR
+    QueryCsv[Query CSV] --> Normalize[Normalize Queries]
+    Normalize --> ReactAgent[ReActAgentOperator]
+    ReactAgent --> RetrieverTool[Retriever Tool]
+    RetrieverTool --> VDB[Vector DB]
+    ReactAgent --> RRFAggregator[RRFAggregatorOperator]
+    RRFAggregator --> SelectionAgent[SelectionAgentOperator]
+    SelectionAgent --> RankedResults[Ranked Results]
+    RankedResults --> Metrics[Recall Metrics]
+```
+
+`ReActAgentOperator` runs an LLM-driven ReAct loop per query. The agent can
+think, issue retrieval subqueries, inspect retrieved candidates, and decide
+when it has enough evidence.
+
+`RRFAggregatorOperator` combines candidates from multiple retrieval steps using
+reciprocal rank fusion. This gives more weight to documents that appear near
+the top across multiple search attempts.
+
+`SelectionAgentOperator` runs a final LLM-based selection pass over the fused
+candidate set and emits the ranked document IDs used for scoring.
+
+## CLI Integration
+
+The main CLI adds a retrieval strategy option:
+
+```bash
+--retrieval-mode standard|agentic
+```
+
+`--evaluation-mode` remains evaluation-focused:
+
+```bash
+--evaluation-mode recall|beir|qa
+```
+
+Supported combinations in the first integration:
+
+- `--evaluation-mode=recall --retrieval-mode=standard`
+- `--evaluation-mode=recall --retrieval-mode=agentic`
+- `--evaluation-mode=qa --retrieval-mode=standard`
+
+Unsupported initially:
+
+- `--evaluation-mode=qa --retrieval-mode=agentic`
+- BEIR through the generic pipeline path remains unchanged and unavailable, as
+  it is in the existing pipeline.
+
+## Agentic Options
+
+`--agentic-llm-model` sets the chat model used by both `ReActAgentOperator` and
+`SelectionAgentOperator`. It is required when `--retrieval-mode=agentic`.
+
+`--agentic-invoke-url` optionally sets the OpenAI-compatible chat completions
+endpoint used by the agent operators. If omitted, the operators use their
+default endpoint.
+
+`--agentic-react-max-steps` controls the maximum ReAct loop iterations per
+query. The default is `10`.
+
+## Wrapped Standard Retrieval
+
+Every agent `retrieve` tool call delegates to the existing
+`nemo_retriever.retriever.Retriever`. That means agentic mode searches the same
+vector database populated by ingestion and reuses the same retrieval settings
+where possible.
+
+Existing options reused by the wrapped retriever:
+
+- `--api-key`: authentication for agentic LLM calls and remote services unless
+  a more specific key is provided.
+- `--embed-model-name`, `--embed-invoke-url`, `--local-query-embed-backend`,
+  `--local-hf-batch-size`: query embedding configuration.
+- `--reranker`, `--reranker-model-name`, `--reranker-invoke-url`,
+  `--reranker-api-key`, `--local-reranker-backend`: optional reranking inside
+  the wrapped retriever.
+
+The first integration intentionally keeps the lower-level agentic retrieval
+parameters fixed:
+
+- retriever top-k: `10`
+- target top-k: `10`
+- selection top-k: `10`
+- query concurrency: `1`
+- text truncation: `500`
+- max tokens: provider default
+- parallel tool calls: disabled
+
+## Examples
+
+Local in-process run:
+
+```bash
+retriever pipeline run ./data/bo767 \
+  --run-mode inprocess \
+  --evaluation-mode recall \
+  --retrieval-mode agentic \
+  --query-csv ./data/bo767_query_gt.csv \
+  --agentic-llm-model meta/llama-3.3-70b-instruct \
+  --api-key os.environ/NVIDIA_API_KEY
+```
+
+Batch run with remote embedding and agent endpoints:
+
+```bash
+retriever pipeline run ./data/bo767 \
+  --run-mode batch \
+  --evaluation-mode recall \
+  --retrieval-mode agentic \
+  --query-csv ./data/bo767_query_gt.csv \
+  --embed-invoke-url http://localhost:8000/v1 \
+  --agentic-invoke-url http://localhost:9000/v1/chat/completions \
+  --agentic-llm-model meta/llama-3.3-70b-instruct \
+  --agentic-react-max-steps 5 \
+  --api-key os.environ/NVIDIA_API_KEY
+```
@@ -0,0 +1,23 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024-25, NVIDIA CORPORATION & AFFILIATES.
+# All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""Agentic retrieval utilities."""
+
+from nemo_retriever.agentic.retrieval import (
+    AgenticRetrievalConfig,
+    AgenticRetriever,
+    build_beir_run_from_agentic_result,
+    build_qrels,
+    run_agentic_beir_evaluation,
+    run_agentic_recall_evaluation,
+)
+
+__all__ = [
+    "AgenticRetrievalConfig",
+    "AgenticRetriever",
+    "build_beir_run_from_agentic_result",
+    "build_qrels",
+    "run_agentic_beir_evaluation",
+    "run_agentic_recall_evaluation",
+]