align workflow snippets with create_ingestor API#2212
Conversation
Update extraction docs to use create_ingestor, correct vdb_upload and sidecar metadata examples, fix GraphIngestor guidance, and tighten cross-links and style.
Greptile SummaryThis PR updates six extraction documentation pages to align code snippets with the
|
| Filename | Overview |
|---|---|
| docs/docs/extraction/audio-video.md | Migrates both Helm and hosted-inference snippets from Ingestor() to create_ingestor(run_mode="batch"); adds missing audio_endpoints for in-cluster Parakeet; fixes broken Python fence; adds results = ingestor.ingest() call; applies NVIDIA style changes. |
| docs/docs/extraction/custom-metadata.md | Corrects vdb_upload parameter names to vdb_kwargs, meta_dataframe, meta_source_field, and meta_fields; adds guidance on remote lancedb_uri for service mode; simplifies the On this page TOC to match the consolidated section structure. |
| docs/docs/extraction/embedding.md | Replaces all three Ingestor() calls with create_ingestor(run_mode="batch") and adds missing imports; minor style and punctuation fixes; removes duplicate link from related topics. |
| docs/docs/extraction/workflow-document-ingestion.md | Renames chunks variable to result; corrects the return-type comment to distinguish ray.data.Dataset (batch) from pandas.DataFrame (inprocess); updates description of GraphIngestor.vdb_upload to reflect that it is now implemented. |
| docs/docs/extraction/vdbs.md | Updates VDB upload wording to use create_ingestor(...) / GraphIngestor.vdb_upload; adds Python API guide cross-link; one sentence at line 80 still names the old Ingestor class while line 129 in the same file now correctly says GraphIngestor.vdb_upload. |
| docs/docs/extraction/nimclient.md | Adds a single cross-reference sentence pointing readers to the Python API guide for ingest and pipeline APIs used in NimClient UDFs. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["create_ingestor(run_mode=...)"] --> B["GraphIngestor (batch / inprocess)"]
A --> C["ServiceIngestor (service)"]
B --> D[".files(...)"]
D --> E[".extract(...)"]
E --> F[".embed(...)"]
F --> G[".vdb_upload(...) now documented"]
G --> H[".ingest() → ray.data.Dataset (batch) or pandas.DataFrame (inprocess)"]
C --> D2[".files(...)"]
D2 --> E2[".extract(...)"]
E2 --> F2[".embed(...)"]
F2 --> G2[".vdb_upload(vdb_kwargs={lancedb_uri, table_name}, meta_*)"]
G2 --> H2[".ingest_async().result()"]
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
docs/docs/extraction/vdbs.md:80
This sentence still refers to the old `Ingestor` class while line 129 of the same file (updated by this PR) uses `GraphIngestor.vdb_upload`. The stale name creates a contradiction inside one document, which could confuse readers following the migration guidance.
```suggestion
When using `GraphIngestor.vdb_upload`, pass `vdb_op="lancedb"` or a `LanceDB` instance so uploads target LanceDB. If you omit `vdb_op`, the library still defaults the string argument to `"milvus"` for backward compatibility, which is not the LanceDB operator—always pass `vdb_op="lancedb"` when you intend LanceDB.
```
Reviews (3): Last reviewed commit: "Merge branch 'main' into kheiss/snippets" | Re-trigger Greptile
…st return type Show audio_endpoints in the Helm Parakeet example and rename the workflow snippet variable so batch mode readers are not misled by a DataFrame name.
Summary
Ingestor()tocreate_ingestor(run_mode="batch")and fix broken Python fences inaudio-video.md.ASRParams.audio_endpointsin the Helm Parakeet example so batch graph ingest reaches the in-cluster NIM (audio:50051); note that endpoint auto-wiring applies to the retriever service, not graph ingest (Greptile).custom-metadata.mdvdb_uploadparameters (vdb_kwargs, sidecar meta triplet) and document remotelancedb_urifor service mode.GraphIngestor.vdb_uploadguidance and documentingest()return types (ray.data.Datasetinbatch;pandas.DataFrameininprocess); rename the workflow snippet variable toresult(Greptile).refer to,through,such as).Test plan
python skills/scripts/validate_code_blocks.py docs/docs/extraction/(code fences parse)