backport PR #2194 extraction doc fixes to main#2203
Conversation
Greptile SummaryThis backport (cherry-pick of
|
| Filename | Overview |
|---|---|
| docs/docs/extraction/audio-video.md | Fixes broken MkDocs code-block nesting (misaligned closing fence/paren) and removes duplicated step-by-step Helm deploy list; no content loss |
| docs/docs/extraction/custom-metadata.md | File deleted; content consolidated inline in vdbs.md and redirect added in mkdocs.yml; avoids undefined-variable code example that existed in the old file |
| docs/docs/extraction/faq.md | Updates two links — chart captioning anchor now correctly targets multimodal-extraction.md sections, and adds Docker Compose developer-only caveat |
| docs/docs/extraction/integrations-langchain-llamaindex-haystack.md | Single link updated from deleted custom-metadata.md to vdbs.md#metadata-and-filtering; anchor verified present |
| docs/docs/extraction/multimodal-extraction.md | Internal anchor links updated to self-referential #image-captioning; OCR section updated to reference Helm chart README for Kubernetes deployment instead of stale support-matrix anchor |
| docs/docs/extraction/prerequisites-support-matrix.md | Adds missing section anchor IDs (#software-requirements, #default-helm-nims, #model-hardware-requirements) and removes the chart-captioning admonition moved to multimodal-extraction.md prose |
| docs/docs/extraction/vdbs.md | Expands the metadata-and-filtering section from a stub pointer to inline prose plus notebook/README links, replacing the deleted custom-metadata.md |
| docs/docs/extraction/workflow-agentic-retrieval.md | Single link updated from deleted custom-metadata.md to vdbs.md#metadata-and-filtering |
| docs/mkdocs.yml | Removes '7. Retrieval & ranking' nav section (custom-metadata.md), renumbers sections 8-13 to 7-12, and adds redirect from extraction/custom-metadata.md to extraction/vdbs.md#metadata-and-filtering |
| nemo_retriever/tests/test_src_documentation_snippets.py | Removes custom-metadata.md from the Python-snippet validation list, consistent with the file deletion; no new snippet coverage gap introduced |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[custom-metadata.md\nDELETED] -->|redirect| B[vdbs.md#metadata-and-filtering]
C[faq.md] -->|link fixed| D[multimodal-extraction.md\n#charts-and-infographics\n#image-captioning]
E[integrations-langchain-\nllamaindex-haystack.md] -->|link fixed| B
F[workflow-agentic-retrieval.md] -->|link fixed| B
G[multimodal-extraction.md] -->|internal link fixed| H[#image-captioning\nself-anchor]
I[prerequisites-support-matrix.md] -->|anchors added| J[#software-requirements\n#default-helm-nims\n#model-hardware-requirements]
K[audio-video.md] -->|code block fence\nindentation fixed| L[Renders correctly\nin MkDocs]
Reviews (6): Last reviewed commit: "Merge branch 'main' into docs/backport-2..." | Re-trigger Greptile
477f48a to
9a24d31
Compare
PR NVIDIA#2194 merged into 26.05 on 2026-06-02 but never reached main. This backport keeps main aligned with the release branch and the published docs.nvidia.com site after Randy's follow-up review. Timeline: - Friday: 26.05 docs built for docs.nvidia upload; branch differed from NRL GitHub Pages source and the uploaded docs were incorrect. - Saturday: diff main vs 26.05 produced PR NVIDIA#2179 to sync extraction docs. - Monday: PR NVIDIA#2179 merged and docs uploaded to the public site. - Follow-up: Randy opened PR NVIDIA#2194 on 26.05 with additional fixes found after the NVIDIA#2179 sync. Those fixes landed on 26.05 only. - This commit: cherry-pick of c5b257e onto main (five extraction doc files only). Changes from NVIDIA#2194: - Fix audio-video.md indented code block rendering - Restore custom-metadata example service variables and storage prose - Move caption scope admonition to multimodal-extraction.md - Trim redundant Helm/OCR deploy detail per review feedback - Restore FAQ Docker Compose note and support-matrix section anchors
9a24d31 to
019547d
Compare
…a page, remove chart admonition Remove custom-metadata.md in favor of vdbs.md#metadata-and-filtering and the metadata filtering notebook. Drop the PDF chart caption admonition from multimodal-extraction.md per review feedback.
Rationale
This PR backports PR #2194 from
26.05tomainso the GitHub source branch and the published docs site stay aligned.Timeline
26.05branch content did not match what the NRL GitHub Pages docs expected, so the uploaded docs were incorrect.mainand26.05and opened PR #2179 to sync the extraction docs on26.05withmain.26.05only (2026-06-02).main. Without this backport,mainstill carries the broken or incomplete state that fix audio-video.md markdown rendering (follow-up to #2179) #2194 corrected on the release branch, and any future build or sync frommainwill diverge from what Randy validated on26.05.Why these changes matter on
mainaudio-video.md— Indented code blocks render incorrectly onmain(broken MkDocs/Markdown nesting introduced during the docs: sync 26.05 docs/docs with main #2179 sync). PR fix audio-video.md markdown rendering (follow-up to #2179) #2194 fixes the fence/indent structure and trims redundant Helm deploy steps.custom-metadata.md— The service-mode example onmainreferences undefinedhostname,table_name, andlancedb_urivariables; fix audio-video.md markdown rendering (follow-up to #2179) #2194 restores them and adds the "How metadata is stored" section with a proper Related content block.multimodal-extraction.md/prerequisites-support-matrix.md/faq.md— Caption scope guidance and OCR deploy detail were updated in fix audio-video.md markdown rendering (follow-up to #2179) #2194 (admonition moved to multimodal-extraction, matrix trimmed, FAQ links and section anchors corrected).mainstill points readers at stale support-matrix anchors.This is a cherry-pick of
c5b257e4(squash merge of #2194) onto currentmain— five extraction doc files only, no release-line merge.Test plan
audio-video.mdcode blocks render correctlycustom-metadata.mdexample for defined variables and section ordermultimodal-extraction.md