Skip to content

backport PR #2194 extraction doc fixes to main#2203

Open
kheiss-uwzoo wants to merge 4 commits into
NVIDIA:mainfrom
kheiss-uwzoo:docs/backport-2194-extraction-docs-fix
Open

backport PR #2194 extraction doc fixes to main#2203
kheiss-uwzoo wants to merge 4 commits into
NVIDIA:mainfrom
kheiss-uwzoo:docs/backport-2194-extraction-docs-fix

Conversation

@kheiss-uwzoo

@kheiss-uwzoo kheiss-uwzoo commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Rationale

This PR backports PR #2194 from 26.05 to main so the GitHub source branch and the published docs site stay aligned.

Timeline

  1. Friday — We were asked to build the 26.05 NeMo Retriever Library docs for upload to docs.nvidia.com. The 26.05 branch content did not match what the NRL GitHub Pages docs expected, so the uploaded docs were incorrect.
  2. Saturday — I ran a diff between main and 26.05 and opened PR #2179 to sync the extraction docs on 26.05 with main.
  3. Monday morning — PR docs: sync 26.05 docs/docs with main #2179 merged. The resulting docs were uploaded to the public site.
  4. Follow-up review — Randy reviewed the published docs and opened PR #2194, which addressed additional extraction-doc issues discovered after the docs: sync 26.05 docs/docs with main #2179 sync. That PR merged into 26.05 only (2026-06-02).
  5. This PR (backport PR #2194 extraction doc fixes to main #2203) — Brings the fix audio-video.md markdown rendering (follow-up to #2179) #2194 fixes onto main. Without this backport, main still carries the broken or incomplete state that fix audio-video.md markdown rendering (follow-up to #2179) #2194 corrected on the release branch, and any future build or sync from main will diverge from what Randy validated on 26.05.

Why these changes matter on main

This is a cherry-pick of c5b257e4 (squash merge of #2194) onto current main — five extraction doc files only, no release-line merge.

Test plan

  • Build docs site locally and confirm audio-video.md code blocks render correctly
  • Spot-check custom-metadata.md example for defined variables and section order
  • Verify caption scope admonition appears under Image captioning in multimodal-extraction.md
  • Confirm no diff beyond the five extraction doc files

@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners June 2, 2026 22:44
@kheiss-uwzoo kheiss-uwzoo requested a review from jperez999 June 2, 2026 22:44
@kheiss-uwzoo kheiss-uwzoo changed the title docs(extraction): backport PR #2194 extraction doc fixes to main backport PR #2194 extraction doc fixes to main Jun 2, 2026
@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This backport (cherry-pick of c5b257e4 from 26.05) brings five extraction doc fixes onto main so the GitHub source and the published docs site stay aligned. All changes are documentation and test-list only — no library code or runtime behavior is modified.

  • custom-metadata.md deleted — the page is replaced by inline prose in vdbs.md#metadata-and-filtering, removing a broken service-mode code example that referenced undefined variables (hostname, table_name, lancedb_uri); a mkdocs-redirects entry preserves existing deep-links.
  • Link and anchor fixes across five pagesfaq.md, multimodal-extraction.md, integrations-langchain-llamaindex-haystack.md, and workflow-agentic-retrieval.md all now point to verified anchors; prerequisites-support-matrix.md gains the missing #software-requirements, #default-helm-nims, and #model-hardware-requirements IDs.
  • audio-video.md code block repaired — misaligned closing fence/paren in a numbered-list code block (broken MkDocs rendering) is corrected, and a duplicated step is removed.

Confidence Score: 5/5

Documentation-only backport; all link targets verified, redirect preserved, no runtime code changed.

Every changed file is Markdown, YAML nav config, or a test list entry. All anchor targets were verified against the post-merge file state, the mkdocs-redirects entry follows the same pattern as existing redirects in the file, and the deleted custom-metadata.md content is adequately replaced inline in vdbs.md.

No files require special attention.

Important Files Changed

Filename Overview
docs/docs/extraction/audio-video.md Fixes broken MkDocs code-block nesting (misaligned closing fence/paren) and removes duplicated step-by-step Helm deploy list; no content loss
docs/docs/extraction/custom-metadata.md File deleted; content consolidated inline in vdbs.md and redirect added in mkdocs.yml; avoids undefined-variable code example that existed in the old file
docs/docs/extraction/faq.md Updates two links — chart captioning anchor now correctly targets multimodal-extraction.md sections, and adds Docker Compose developer-only caveat
docs/docs/extraction/integrations-langchain-llamaindex-haystack.md Single link updated from deleted custom-metadata.md to vdbs.md#metadata-and-filtering; anchor verified present
docs/docs/extraction/multimodal-extraction.md Internal anchor links updated to self-referential #image-captioning; OCR section updated to reference Helm chart README for Kubernetes deployment instead of stale support-matrix anchor
docs/docs/extraction/prerequisites-support-matrix.md Adds missing section anchor IDs (#software-requirements, #default-helm-nims, #model-hardware-requirements) and removes the chart-captioning admonition moved to multimodal-extraction.md prose
docs/docs/extraction/vdbs.md Expands the metadata-and-filtering section from a stub pointer to inline prose plus notebook/README links, replacing the deleted custom-metadata.md
docs/docs/extraction/workflow-agentic-retrieval.md Single link updated from deleted custom-metadata.md to vdbs.md#metadata-and-filtering
docs/mkdocs.yml Removes '7. Retrieval & ranking' nav section (custom-metadata.md), renumbers sections 8-13 to 7-12, and adds redirect from extraction/custom-metadata.md to extraction/vdbs.md#metadata-and-filtering
nemo_retriever/tests/test_src_documentation_snippets.py Removes custom-metadata.md from the Python-snippet validation list, consistent with the file deletion; no new snippet coverage gap introduced

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[custom-metadata.md\nDELETED] -->|redirect| B[vdbs.md#metadata-and-filtering]
    C[faq.md] -->|link fixed| D[multimodal-extraction.md\n#charts-and-infographics\n#image-captioning]
    E[integrations-langchain-\nllamaindex-haystack.md] -->|link fixed| B
    F[workflow-agentic-retrieval.md] -->|link fixed| B
    G[multimodal-extraction.md] -->|internal link fixed| H[#image-captioning\nself-anchor]
    I[prerequisites-support-matrix.md] -->|anchors added| J[#software-requirements\n#default-helm-nims\n#model-hardware-requirements]
    K[audio-video.md] -->|code block fence\nindentation fixed| L[Renders correctly\nin MkDocs]
Loading

Reviews (6): Last reviewed commit: "Merge branch 'main' into docs/backport-2..." | Re-trigger Greptile

@kheiss-uwzoo kheiss-uwzoo force-pushed the docs/backport-2194-extraction-docs-fix branch from 477f48a to 9a24d31 Compare June 2, 2026 22:55
@kheiss-uwzoo kheiss-uwzoo changed the title backport PR #2194 extraction doc fixes to main docs(extraction): backport PR #2194 extraction doc fixes to main Jun 2, 2026
PR NVIDIA#2194 merged into 26.05 on 2026-06-02 but never reached main. This
backport keeps main aligned with the release branch and the published
docs.nvidia.com site after Randy's follow-up review.

Timeline:
- Friday: 26.05 docs built for docs.nvidia upload; branch differed from
  NRL GitHub Pages source and the uploaded docs were incorrect.
- Saturday: diff main vs 26.05 produced PR NVIDIA#2179 to sync extraction docs.
- Monday: PR NVIDIA#2179 merged and docs uploaded to the public site.
- Follow-up: Randy opened PR NVIDIA#2194 on 26.05 with additional fixes found
  after the NVIDIA#2179 sync. Those fixes landed on 26.05 only.
- This commit: cherry-pick of c5b257e onto main (five extraction doc
  files only).

Changes from NVIDIA#2194:
- Fix audio-video.md indented code block rendering
- Restore custom-metadata example service variables and storage prose
- Move caption scope admonition to multimodal-extraction.md
- Trim redundant Helm/OCR deploy detail per review feedback
- Restore FAQ Docker Compose note and support-matrix section anchors
@kheiss-uwzoo kheiss-uwzoo force-pushed the docs/backport-2194-extraction-docs-fix branch from 9a24d31 to 019547d Compare June 2, 2026 23:01
Comment thread docs/docs/extraction/custom-metadata.md Outdated
Comment thread docs/docs/extraction/multimodal-extraction.md Outdated
…a page, remove chart admonition

Remove custom-metadata.md in favor of vdbs.md#metadata-and-filtering and the metadata filtering notebook. Drop the PDF chart caption admonition from multimodal-extraction.md per review feedback.
@kheiss-uwzoo kheiss-uwzoo requested a review from randerzander June 5, 2026 17:54
@kheiss-uwzoo kheiss-uwzoo changed the title docs(extraction): backport PR #2194 extraction doc fixes to main backport PR #2194 extraction doc fixes to main Jun 5, 2026
@kheiss-uwzoo kheiss-uwzoo self-assigned this Jun 8, 2026
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants