Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/docs/extraction/agentic-retrieval-concept.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ NeMo Retriever Library focuses on document ingestion, embeddings, vector stores,
**Related**

- [Semantic retrieval](vdbs.md#semantic-retrieval)
- Framework examples: [LangChain, LlamaIndex, Haystack](integrations-langchain-llamaindex-haystack.md)
- Framework examples: [Jupyter Notebooks](notebooks/index.md)
128 changes: 0 additions & 128 deletions docs/docs/extraction/custom-metadata.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/docs/extraction/deployment-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ environments), use a custom service image that already contains `ffmpeg` and
### I want examples and notebooks

1. [Jupyter Notebooks](notebooks/index.md)
2. [Integrate with LangChain, LlamaIndex, Haystack](integrations-langchain-llamaindex-haystack.md)

### I need API details and keys

Expand Down

This file was deleted.

2 changes: 1 addition & 1 deletion docs/docs/extraction/notebooks/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ To get started with the basics, try one of the following notebooks:

- [CLI Quick Start Guide](https://github.com/NVIDIA/NeMo-Retriever/blob/main/client/client_examples/examples/cli_client_usage.ipynb)
- [Python Quick Start Guide](https://github.com/NVIDIA/NeMo-Retriever/blob/main/client/client_examples/examples/python_client_usage.ipynb)
- [How to add metadata to your documents and filter searches](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/metadata_and_filtered_search.ipynb)
- [Metadata filtering: add sidecar metadata and filter searches](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb)
Comment thread
kheiss-uwzoo marked this conversation as resolved.
- [How to reindex a collection](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/reindex_example.ipynb)


Expand Down
2 changes: 1 addition & 1 deletion docs/docs/extraction/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,4 @@ NeMo Retriever Library supports the following file types:
- [Deploy on Kubernetes with Helm](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md)
- [NeMo Retriever Library — prerequisites / deployment](https://docs.nvidia.com/nemo/retriever/latest/extraction/overview/) (supported Helm charts)
- [Notebooks](notebooks/index.md)
- [NVIDIA AI Blueprints catalog](https://build.nvidia.com/explore/discover) — solution cards, enterprise RAG blueprints, and end-to-end patterns (including [Enterprise RAG — multimodal PDF data extraction](https://build.nvidia.com/nvidia/multimodal-pdf-data-extraction-for-enterprise-rag)); for integration pathways, refer to [Integrations](integrations-langchain-llamaindex-haystack.md).
- [NVIDIA AI Blueprints catalog](https://build.nvidia.com/explore/discover) — solution cards, enterprise RAG blueprints, and end-to-end patterns (including [Enterprise RAG — multimodal PDF data extraction](https://build.nvidia.com/nvidia/multimodal-pdf-data-extraction-for-enterprise-rag)); for integration pathways, refer to [Jupyter Notebooks](notebooks/index.md).
15 changes: 10 additions & 5 deletions docs/docs/extraction/vdbs.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,10 +89,15 @@ Semantic retrieval uses dense embeddings to find content that is similar in mean

## Metadata and filtering { #metadata-and-filtering }
Comment thread
kheiss-uwzoo marked this conversation as resolved.

This page covers LanceDB upload and retrieval. **Metadata is not duplicated here.**
Attach per-document metadata during ingestion and narrow LanceDB results at query time.

- **Published guide** — [Custom metadata and filtering](custom-metadata.md) (sidecar `meta_*` on `vdb_upload`, compact JSON in LanceDB, server-side `where` on `Retriever.query`, and client-side `filter_hits_by_content_metadata`).
- **Canonical reference** — [Vector DB operators and LanceDB — Metadata filtering](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering) in `nemo_retriever/src/nemo_retriever/vdb/README.md` (operator behavior and examples).
Pass a **sidecar metadata table** on `vdb_upload` with `meta_dataframe`, `meta_source_field`, and `meta_fields` (all three required). Selected columns merge into each chunk's `content_metadata` before upload. During upload, that object is serialized as **compact JSON** in the LanceDB `metadata` column. Filter with server-side `where` on [`Retriever.query`](nemo-retriever-api-reference.md) or client-side `filter_hits_by_content_metadata`.

**Worked example** — [nemo_retriever_retriever_query_metadata_filter.ipynb](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb) — sidecar metadata at ingest, `Retriever.query` with `where`, and client-side filters.

**Retriever service** — Upload the sidecar file with [`POST /v1/ingest/sidecar`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/routers/ingest.py) (multipart upload; refer to [`SidecarUploadResponse`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/models/responses.py#L60-L68)), then pass the returned `sidecar_id` as `meta_dataframe_id` with `meta_source_field` and `meta_fields` in `pipeline.vdb_upload_params` on [`POST /v1/ingest`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/models/requests.py). Do not pass a local filesystem path as `meta_dataframe` in the service spec. Request shapes and form fields are in the OpenAPI UI at `/docs` on your retriever base URL (for example `http://localhost:7670/docs` after `retriever service start`).

For parameter tables, SQL predicate patterns, and operator behavior, refer to [Vector DB operators and LanceDB — Metadata filtering](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering) in `nemo_retriever/src/nemo_retriever/vdb/README.md`. The worked example notebook is also listed on [Notebooks for NeMo Retriever Library](notebooks/index.md).

## LanceDB deployment characteristics { #lancedb-deployment-characteristics }

Expand Down Expand Up @@ -142,7 +147,7 @@ Testing and release cadence for these integrations follow the owning project (RA

### More information (embeddings & custom `VDB`) { #vector-database-partners-more-info }

- [Custom metadata and filtering](custom-metadata.md) and the package [VDB README (metadata filtering)](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering)
- [Metadata filtering notebook](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb) and the package [VDB README (metadata filtering)](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering)
- [Multimodal embeddings (VLM)](embedding.md)
- [NeMo Retriever Text Embedding NIM](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html)
- [NVIDIA NIM catalog](https://build.nvidia.com/) for embedding and retrieval-related NIMs
Expand All @@ -155,7 +160,7 @@ To implement a custom operator, follow the `VDB` abstract interface described in

## Related Topics { #related-topics }

- [Custom metadata and filtering](custom-metadata.md)
- [Metadata filtering: add sidecar metadata and filter searches](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb)
- [Vector DB operators and LanceDB (source)](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb)
- [Use the NeMo Retriever Library Python API](nemo-retriever-api-reference.md)
- [Store Extracted Images](nemo-retriever-api-reference.md)
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/extraction/workflow-agentic-retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ NeMo Retriever Library provides ingestion, embedding, storage, and retrieval bui

Use these pages together with your orchestration layer:

- [Semantic retrieval](vdbs.md#semantic-retrieval), [Custom metadata and filtering](custom-metadata.md), and [Evaluate on your data](evaluate-on-your-data.md) for retrieval quality and reranking notes
- [Semantic retrieval](vdbs.md#semantic-retrieval), [Metadata and filtering](vdbs.md#metadata-and-filtering), and [Evaluate on your data](evaluate-on-your-data.md) for retrieval quality and reranking notes
- [Agentic retrieval (concept)](agentic-retrieval-concept.md)
- [Evaluate on your data](evaluate-on-your-data.md), which includes retrieval evaluation guidance
- [Release notes](releasenotes.md), which may mention agentic retrieval updates
2 changes: 1 addition & 1 deletion docs/docs/extraction/workflow-e2e-blueprints.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ Use these external resources for end-to-end RAG implementations with NeMo Retrie
- [Enterprise RAG - multimodal PDF data extraction](https://build.nvidia.com/nvidia/multimodal-pdf-data-extraction-for-enterprise-rag)
- [NVIDIA AI Blueprints catalog](https://build.nvidia.com/explore/discover)

For framework-specific integration patterns, see [Framework integrations](integrations-langchain-llamaindex-haystack.md).
For framework-specific integration patterns, see [Jupyter Notebooks](notebooks/index.md).
17 changes: 8 additions & 9 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,26 +95,23 @@ nav:
# Single vector-DB page (vdbs.md). Deep links: in-page "On this page" TOC and redirects
# (for example extraction/vector-db-partners.md → vdbs.md#vector-database-partners).
- "Vector databases": extraction/vdbs.md
- "7. Retrieval & ranking":
- "Custom metadata and filtering": extraction/custom-metadata.md
- "8. Deployment & operations":
- "7. Deployment & operations":
- "Ray and distributed ingest": extraction/ray-logging.md
- "9. Customize & extend":
- "8. Customize & extend":
- Extending/Customizing NeMo Retriever Library with custom code: https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/graph#nemo-retriever-graph
- "NimClient and custom NIM endpoints": extraction/nimclient.md
- "10. Integrations & ecosystem":
- "Framework integrations": extraction/integrations-langchain-llamaindex-haystack.md
- "9. Integrations & ecosystem":
- "Starter kits": extraction/notebooks/index.md
- "11. Evaluation & benchmarks":
- "10. Evaluation & benchmarks":
- "Evaluate on your own documents": extraction/evaluate-on-your-data.md
- "12. Reference":
- "11. Reference":
- "API guide": extraction/nemo-retriever-api-reference.md
# TODO: after nv-ingest code removal, update this link when CLI docs are relocated.
- "CLI reference": https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli
- "Quickstart: retriever CLI": reference/retriever-cli-quickstart.md
- Environment variables: extraction/environment-config.md
- "Metadata reference": extraction/content-metadata.md
- "13. Support & community":
- "12. Support & community":
- Troubleshooting: extraction/troubleshoot.md
- FAQ: extraction/faq.md
- Contributing: extraction/contributing.md
Expand Down Expand Up @@ -161,6 +158,8 @@ plugins:
extraction/ngc-api-key.md: extraction/api-keys.md
extraction/notebooks.md: extraction/notebooks/index.md
extraction/data-store.md: extraction/vdbs.md
extraction/custom-metadata.md: extraction/vdbs.md#metadata-and-filtering
extraction/integrations-langchain-llamaindex-haystack.md: extraction/notebooks/index.md
extraction/nemoretriever-parse.md: extraction/multimodal-extraction.md#text-and-layout-extraction
extraction/supported-file-types.md: extraction/multimodal-extraction.md#supported-file-types-and-formats
extraction/text-layout-extraction.md: extraction/multimodal-extraction.md#text-and-layout-extraction
Expand Down
2 changes: 1 addition & 1 deletion nemo_retriever/tests/test_src_documentation_snippets.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def _iter_markdown_python_blocks() -> list[tuple[str, str]]:
_MD_BLOCKS = _iter_markdown_python_blocks()
_PUBLIC_RETRIEVER_DOCS = (
"README.md",
"docs/docs/extraction/custom-metadata.md",
"docs/docs/extraction/vdbs.md",
"examples/nemo_retriever_retriever_query_metadata_filter.ipynb",
"nemo_retriever/README.md",
"nemo_retriever/docs/cli/README.md",
Expand Down
Loading