Skip to content

[Chaitanya] Implement #215: Feat: Pinecone as a datasource plugin#216

Draft
deepesh-dg wants to merge 1 commit into
mainfrom
chaitanya/issue-215-feat-pinecone-as-a-datasource-plugin
Draft

[Chaitanya] Implement #215: Feat: Pinecone as a datasource plugin#216
deepesh-dg wants to merge 1 commit into
mainfrom
chaitanya/issue-215-feat-pinecone-as-a-datasource-plugin

Conversation

@deepesh-dg
Copy link
Copy Markdown
Collaborator

Issue #215: Feat: Pinecone as a datasource plugin

The Problem

Developers building AI/RAG (Retrieval-Augmented Generation) applications with Godspeed need to store and retrieve vector embeddings for semantic search, similarity matching, and long-term memory layers. There is currently no official Godspeed plugin for Pinecone — a leading managed vector database. Teams are forced to write custom integration code outside the framework's declarative datasource pattern, losing the benefits of unified config, lifecycle management, and workflow composability.

The Solution

A new official plugin: @godspeedsystems/plugins-pinecone-as-datasource that integrates Pinecone as a Godspeed DataSource.

Developers should be able to:

  • Configure Pinecone via a YAML datasource file (API key, index name, dimensions, metric, cloud/region, etc.)
  • Use the plugin in workflows to perform all core vector operations: insert, update, delete, query, and deleteIndex
  • Organize vectors into namespaces for multi-tenant or multi-collection use cases
  • Attach and filter by metadata on vectors
  • Have the Pinecone index auto-created on startup if it does not already exist, using the config parameters

What this does not do?

  • Does not generate embeddings — callers must supply pre-computed vectors
  • Does not support Pinecone pod-based indexes (serverless only)
  • Does not expose Pinecone account-level admin operations (project management, API key rotation, etc.)
  • Does not perform bulk import from external storage — only direct vector upserts

How will we solve

Follow the standard Godspeed DataSource plugin pattern (GSDataSource base class from @godspeedsystems/core) with:

  • initClient() — establish Pinecone client and ensure the configured index exists
  • execute(ctx, args) — dispatch to the appropriate operation based on args.method

Plugin lives at plugins/pinecone-as-datasource/ in this repo.

Any Special Considerations or Assumptions

  • Index creation is destructive-config: index parameters (dimension, metric, cloud, region) are fixed at creation time. Changing them in config after the index exists has no effect — the index must be manually deleted first.
  • Eventual consistency: Pinecone writes are not immediately queryable. Tests and workflows that insert then immediately query may need a short wait.
  • API key is required: the plugin must fail fast with a clear error if apiKey is not provided — do not silently fall back.
  • Default namespace: operations that omit namespace fall back to a 'default' namespace to keep behavior predictable.
  • Serverless-only: the plugin targets Pinecone's serverless spec (cloud + region). Pod-based index support is a future improvement.

Impact Areas

  • New plugin only — no changes to any existing plugin or to @godspeedsystems/core
  • Adds a new entry to the plugin table in README.md

Test Cases

  • Plugin initializes with a valid API key and creates the index if it does not exist
  • Plugin throws a clear error on missing API key
  • Index is reused (not re-created) if it already exists on startup
  • insert: single document is upserted; response includes id, indexName, and namespace
  • insert: batch of documents is upserted in one call
  • query: returns up to topK matches with id, score, and metadata
  • query: results respect namespace isolation — documents in namespace A are not returned by a query scoped to namespace B
  • query: metadata filter correctly limits results
  • update: merges new metadata with existing metadata without overwriting unrelated fields
  • update: replaces vector values when new values are supplied
  • delete: removes a single vector by id
  • delete: removes multiple vectors by id list
  • delete: removes vectors matching a metadata filter
  • deleteIndex: deletes the index; subsequent init recreates it

Future Improvements

  • Pod-based index support
  • Hybrid search (dense + sparse vectors)
  • Batch query (multiple query vectors in one call)
  • Built-in embedding generation via a configurable model (e.g. OpenAI, Cohere)
  • Namespace listing and stats via a describeIndex method

Definition of Done (DoD)

  • Plugin lives at plugins/pinecone-as-datasource/ following the standard structure (src/index.ts, package.json, tsconfig.json, README.md, .gitignore, .npmignore)
  • Package named @godspeedsystems/plugins-pinecone-as-datasource and published to npm
  • All five operations work correctly via execute(): insert, update, delete, deleteIndex, query
  • Required exports present: DataSource (default), SourceType, Type, CONFIG_FILE_NAME, DEFAULT_CONFIG
  • DEFAULT_CONFIG documents every supported config key with its default value
  • Plugin builds without TypeScript errors (npm run build)
  • README.md covers: installation, full config YAML schema with all options, each method with an example workflow snippet
  • All test cases listed above pass
  • Plugin entry added to the plugin table in the root README.md

Implementation Plan

The plan file has been written. Here's a summary of what's in CHAITANYA_ISSUE_215_PLAN.md:


What the plan covers

8 files to create, 1 file to modify:

File What
plugins/pinecone-as-datasource/src/index.ts Full plugin source — DataSource class with initClient(), execute(), and 5 private method helpers (_insert, _query, _update, _delete, _deleteIndex). Exact TypeScript pasted in.
src/__tests__/index.test.ts Complete Jest test suite — 26 assertions across 14 test cases, all using a fully-mocked Pinecone SDK (zero network).
package.json @godspeedsystems/plugins-pinecone-as-datasource v1.0.0, with @pinecone-database/pinecone ^3.0.0 as a dep and jest/ts-jest/@types/jest as devDeps.
tsconfig.json Standard shared config + "./src/__tests__" in exclude so tests don't land in dist/.
jest.config.js ts-jest preset, clearMocks: true (critical for isolation between tests).
README.md Installation, full config YAML table, all 5 methods with workflow YAML examples.
.gitignore / .npmignore Standard ignores; .npmignore also excludes jest.config.js.
Root README.md One new row 12 in the plugin table for Pinecone.

Key design decisions documented in the plan:

  • Method resolved from fnNameInWorkflow.split('.')[2] (standard pattern) with args.method fallback
  • namespace defaults to "default" — never empty string
  • initClient() throws on missing apiKey (fast-fail, correct Godspeed startup pattern)
  • waitUntilReady: true on index creation
  • update uses Pinecone SDK's setMetadata (merges, doesn't overwrite) — not upsert
  • A "Common Pitfalls" section flags 10 specific traps (wrong SDK package name, mock placement, clearMocks, unknown vs any, etc.)

Implementation in progress by Chaitanya AI...

@deepesh-dg
Copy link
Copy Markdown
Collaborator Author

Implementation Started

I've analyzed the issue and created an implementation plan (CHAITANYA_ISSUE_215_PLAN.md). Here's a summary:

The plan file has been written. Here's a summary of what's in CHAITANYA_ISSUE_215_PLAN.md:


What the plan covers

8 files to create, 1 file to modify:

File What
plugins/pinecone-as-datasource/src/index.ts Full plugin source — DataSource class with initClient(), execute(), and 5 private method helpers (_insert, _query, _update, _delete, _deleteIndex). Exact TypeScript pasted in.
src/__tests__/index.test.ts Complete Jest test suite — 26 assertions across 14 test cases, all using a fully-mocked Pinecone SDK (zero network).
package.json @godspeedsystems/plugins-pinecone-as-datasource v1.0.0, with @pinecone-database/pinecone ^3.0.0 as a dep and jest/ts-jest/@types/jest as devDeps.
tsconfig.json Standard shared config + "./src/__tests__" in exclude so tests don't land in dist/.
jest.config.js ts-jest preset, clearMocks: true (critical for isolation between tests).
README.md Installation, full config YAML table, all 5 methods with workflow YAML examples.
.gitignore / .npmignore Standard ignores; .npmignore also excludes jest.config.js.
Root README.md One new row 12 in the plugin table for Pinecone.

Key design decisions documented in the plan:

  • Method resolved from fnNameInWorkflow.split('.')[2] (standard pattern) with args.method fallback
  • namespace defaults to "default" — never empty string
  • initClient() throws on missing apiKey (fast-fail, correct Godspeed startup pattern)
  • waitUntilReady: true on index creation
  • update uses Pinecone SDK's setMetadata (merges, doesn't overwrite) — not upsert
  • A "Common Pitfalls" section flags 10 specific traps (wrong SDK package name, mock placement, clearMocks, unknown vs any, etc.)

Now proceeding with implementation based on this plan.

Branch: chaitanya/issue-215-feat-pinecone-as-a-datasource-plugin
Base: main

@deepesh-dg
Copy link
Copy Markdown
Collaborator Author

Implementation Failed

An error occurred while processing this issue. The draft PR has been left for manual handling.

Claude Agent chat failed: Claude Code returned an error result: Reached maximum number of turns (50)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant