Parth576 · Parth576 · Mar 6, 2026 · Mar 5, 2026 · Mar 5, 2026 · Mar 5, 2026
diff --git a/.agents/scratchpad/2026-02-15-smolterms/dot-product-search/context.md b/.agents/scratchpad/2026-02-15-smolterms/dot-product-search/context.md
@@ -0,0 +1,21 @@
+# Context: Dot Product Similarity Search
+
+## Task
+Implement the core similarity search logic for InMemoryStore: brute-force dot product over stored vectors, filtered by URL/contentHash, returning top N most similar chunks sorted by score descending.
+
+## Key Files
+- `backend/internal/vectorstore/memory.go` — InMemoryStore with placeholder Search()
+- `backend/internal/vectorstore/memory_test.go` — Existing tests from Task 1
+- `backend/internal/vectorstore/store.go` — Chunk struct (has Score field) and VectorStore interface
+
+## Patterns
+- Uses `sync.RWMutex` for thread safety (RLock for reads, Lock for writes)
+- Structured logging with `slog` including latency_ms
+- Returns `[]Chunk{}` (not nil) for empty results
+- Chunk.Score field is `float32` with `omitempty` JSON tag
+
+## Implementation Requirements
+1. `dotProduct(a, b []float32) float32` — helper, unexported
+2. `Search()` — filter by url/contentHash, compute dot product, set Score, sort descending, limit results
+3. Edge cases: empty collection → `[]Chunk{}`, limit ≤ 0 → `[]Chunk{}`, nil/empty vectors → skip
+4. Log: collectionID, filters, result count, latency_ms
diff --git a/.agents/scratchpad/2026-02-15-smolterms/dot-product-search/plan.md b/.agents/scratchpad/2026-02-15-smolterms/dot-product-search/plan.md
@@ -0,0 +1,27 @@
+# Plan: Dot Product Similarity Search
+
+## Test Scenarios
+
+1. **TestDotProduct** — Known vectors, verify mathematical correctness
+2. **TestDotProduct_ZeroVectors** — Zero vector dot product = 0
+3. **TestInMemoryStore_Search_TopN** — 20 chunks, limit=15, verify 15 returned sorted descending
+4. **TestInMemoryStore_Search_URLFilter** — Only chunks from specified URL considered
+5. **TestInMemoryStore_Search_ContentHashFilter** — Both URL and contentHash filter applied
+6. **TestInMemoryStore_Search_EmptyResults** — Empty collection returns []Chunk{} (not nil)
+7. **TestInMemoryStore_Search_LimitZeroOrNegative** — Returns empty slice
+8. **TestInMemoryStore_Search_ScorePopulation** — Each returned chunk has Score set
+9. **TestInMemoryStore_Search_SkipsNilVectors** — Chunks with nil/empty vectors skipped
+10. **TestInMemoryStore_Search_LimitExceedsAvailable** — Returns all if fewer than limit
+11. **TestInMemoryStore_Search_ConcurrentReads** — No race conditions with -race
+12. **TestInMemoryStore_Search_Logging** — Structured log with correct fields
+
+## Implementation Plan
+
+1. Add `dotProduct` helper (unexported, simple loop)
+2. Replace placeholder Search:
+   - Early return for limit ≤ 0
+   - Get chunks, filter by url/contentHash
+   - Skip nil/empty vectors, compute dot product, set Score
+   - Sort with `slices.SortFunc` descending
+   - Truncate to limit
+   - Log with slog
diff --git a/.agents/scratchpad/2026-02-15-smolterms/dot-product-search/progress.md b/.agents/scratchpad/2026-02-15-smolterms/dot-product-search/progress.md
@@ -0,0 +1,15 @@
+# Progress: Dot Product Similarity Search
+
+## Setup
+- [x] Created documentation directory
+- [x] Reviewed task requirements and existing code
+
+## Implementation
+- [x] Write TDD tests for search functionality (12 new tests)
+- [x] Verify tests fail (RED phase) — dotProduct undefined
+- [x] Implement dotProduct helper
+- [x] Implement full Search method with filtering, scoring, sorting, logging
+- [x] Fix existing Task 1 tests to include vectors on chunks
+- [x] Verify tests pass (GREEN phase) — all 60+ tests pass
+- [x] Run full backend test suite with -race — all packages pass
+- [ ] Commit
diff --git a/.../scratchpad/2026-02-15-smolterms/step18-task01-in-memory-vectorstore/context.md b/.../scratchpad/2026-02-15-smolterms/step18-task01-in-memory-vectorstore/context.md
@@ -0,0 +1,26 @@
+# Context: In-Memory VectorStore and Config
+
+## Requirements
+- Add `Delete` method to `VectorStore` interface
+- Create `InMemoryStore` implementing full `VectorStore` interface
+- Add `VECTOR_STORE` config flag (memory/qdrant, default: memory)
+- Update `main.go` for conditional store initialization
+- Add `Delete` to RAG pipeline (pass-through)
+- Call `Delete` in analyzer after caching results
+
+## Key Files
+- `backend/internal/vectorstore/store.go` - Interface + MockVectorStore
+- `backend/internal/vectorstore/qdrant.go` - QdrantStore (add no-op Delete)
+- `backend/internal/vectorstore/memory.go` - New InMemoryStore
+- `backend/internal/config/config.go` - Add VectorStore field
+- `backend/cmd/server/main.go` - Conditional wiring
+- `backend/internal/rag/pipeline.go` - Add Delete pass-through
+- `backend/internal/analyzer/analyzer.go` - Call Delete after caching
+
+## Patterns
+- stdlib testing, no test framework
+- `slog.Logger` for structured logging
+- `sync.RWMutex` for thread safety
+- Interface-based DI throughout
+- `t.Setenv()` for config tests
+- Mock types record calls for assertion
diff --git a/...scratchpad/2026-02-15-smolterms/step18-task01-in-memory-vectorstore/progress.md b/...scratchpad/2026-02-15-smolterms/step18-task01-in-memory-vectorstore/progress.md
@@ -0,0 +1,24 @@
+# Progress: In-Memory VectorStore and Config
+
+## Setup
+- [x] Documentation directory created
+- [x] Context document created
+- [x] Existing code reviewed
+
+## Implementation
+- [x] Add Delete to VectorStore interface and MockVectorStore
+- [x] Add no-op Delete to QdrantStore
+- [x] Write InMemoryStore tests (TDD - RED) - 13 tests covering Upsert, Delete, Search, HealthCheck, thread safety, nil logger
+- [x] Create InMemoryStore implementation (TDD - GREEN)
+- [x] Add VectorStore config field (VECTOR_STORE env var, default "memory")
+- [x] Update main.go wiring (conditional init, renamed constant to `collection`)
+- [x] Add Delete to RAG pipeline (pass-through to store.Delete)
+- [x] Call Delete in analyzer after caching (Stage 12, warn-only on error)
+- [x] Run full test suite - ALL PASS with -race flag
+- [x] Build compiles successfully
+
+## Decisions
+- Delete in analyzer logs warning on error but does not fail the pipeline (non-critical cleanup)
+- InMemoryStore.Search is a placeholder that returns filtered chunks without similarity scoring (Task 2)
+- InMemoryStore.Delete matches on both URL AND contentHash (both must match)
+- Used `kept := chunks[:0]` pattern for in-place filtering to reduce allocations
diff --git a/.agents/scratchpad/2026-02-15-smolterms/task-03-docker-compose-and-docs/context.md b/.agents/scratchpad/2026-02-15-smolterms/task-03-docker-compose-and-docs/context.md
@@ -0,0 +1,23 @@
+# Context: Task 03 - Docker Compose Qdrant Profile & Documentation Updates
+
+## Requirements
+1. Update `docker-compose.yml`: Add `profiles: [qdrant]` to Qdrant, remove `depends_on` and `QDRANT_URL` env override from backend
+2. Update `README.md`: Reflect in-memory default, document Qdrant profile, add `VECTOR_STORE` env var
+3. Update `.env.example`: Add `VECTOR_STORE=memory` with comments
+4. Update `CLAUDE.md`: Update Build & Run and Tech Stack sections
+
+## Current State
+- Config already supports `VECTOR_STORE` env var with `memory` default (config.go:43)
+- main.go already switches between memory/qdrant based on `cfg.VectorStore` (main.go:46-61)
+- InMemoryStore fully implemented and tested
+- docker-compose.yml currently has hard `depends_on: qdrant` and `QDRANT_URL=qdrant:6334` override
+
+## Files to Modify
+- `docker-compose.yml` (root)
+- `README.md` (root)
+- `.env.example` (root - sandbox restricted, may need alternative)
+- `CLAUDE.md` (root)
+
+## Key Decisions
+- No tests needed - this is config/docs only
+- The `QDRANT_URL` env override removal means when running with `--profile qdrant`, user must set `VECTOR_STORE=qdrant` and `QDRANT_URL=qdrant:6334` in their `.env`
diff --git a/.agents/scratchpad/2026-02-15-smolterms/task-03-docker-compose-and-docs/plan.md b/.agents/scratchpad/2026-02-15-smolterms/task-03-docker-compose-and-docs/plan.md
@@ -0,0 +1,30 @@
+# Plan: Task 03 - Docker Compose & Docs
+
+## Test Strategy
+No unit tests needed - this is purely configuration and documentation changes. Validation via `docker compose config`.
+
+## Implementation Plan
+
+### 1. docker-compose.yml
+- Add `profiles: [qdrant]` to qdrant service
+- Remove `depends_on: qdrant` from backend
+- Remove `QDRANT_URL=qdrant:6334` environment override from backend
+
+### 2. README.md
+- Update "How It Works" to mention vector store (in-memory default)
+- Update "Local Development > Option A" to show `docker compose up` as default (backend only)
+- Add Qdrant profile instructions: `docker compose --profile qdrant up`
+- Add `VECTOR_STORE` to environment variables section
+- Update Option C to reflect optional Qdrant
+- Update health check response to not assume Qdrant
+
+### 3. .env.example
+- Add `VECTOR_STORE=memory` with comment
+- Add comment that `QDRANT_URL` only needed with `VECTOR_STORE=qdrant`
+
+### 4. CLAUDE.md
+- Update Build & Run section
+- Update Tech Stack to mention in-memory default
+
+### 5. Validation
+- Run `docker compose config` to verify syntax
diff --git a/...nts/scratchpad/2026-02-15-smolterms/task-03-docker-compose-and-docs/progress.md b/...nts/scratchpad/2026-02-15-smolterms/task-03-docker-compose-and-docs/progress.md
@@ -0,0 +1,20 @@
+# Progress: Task 03 - Docker Compose & Docs
+
+## Setup
+- [x] Created documentation directory
+- [x] Read all relevant files
+- [x] Created context.md
+
+## Implementation
+- [x] Update docker-compose.yml - added `profiles: [qdrant]`, removed `depends_on` and `QDRANT_URL` env override
+- [x] Update README.md - updated all sections to reflect in-memory default, Qdrant as optional
+- [x] Update .env.example - **BLOCKED**: sandbox prevents read/write to .env.example (in deny list)
+- [x] Update CLAUDE.md - updated Tech Stack, Build & Run, Data Flow, package layout, API endpoints
+- [ ] Validate docker-compose.yml syntax - docker commands denied by sandbox
+- [ ] Commit changes
+
+## Notes
+- `.env.example` could not be updated due to sandbox read/write restrictions
+- `docker compose config` could not be run due to sandbox restrictions
+- `go test` could not be run due to sandbox Go cache restrictions
+- All changes are config/docs only - no functional code changes
diff --git a/.env.example b/.env.example
@@ -13,7 +13,12 @@ ANTHROPIC_API_KEY=sk-ant-your-key-here
 # OpenAI API key (required, used for embeddings)
 OPENAI_API_KEY=sk-your-key-here
 
-# Qdrant gRPC address (default: localhost:6334)
+# Vector store backend: "memory" (default) or "qdrant"
+# In-memory requires no external services. Use "qdrant" for persistent storage.
+VECTOR_STORE=memory
+
+# Qdrant gRPC address (only needed when VECTOR_STORE=qdrant)
+# Set to qdrant:6334 when using Docker/Podman Compose with --profile qdrant
 QDRANT_URL=localhost:6334
 
 # Cache TTL for analysis results (default: 720h = 30 days)

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -16,18 +16,19 @@ SmolTerms is a privacy policy and terms of service analyzer. A browser extension
 - **Extension:** Vanilla JS (no build system, no framework), Manifest V3, Firefox + Chrome
 - **LLM:** Anthropic API direct (Claude Sonnet 4.5), behind `LLMClient` interface for provider swapping
 - **Embeddings:** OpenAI `text-embedding-3-small` (1536 dimensions), behind `EmbeddingClient` interface
-- **Vector DB:** Qdrant (official Go gRPC client), behind `VectorStore` interface
+- **Vector Store:** In-memory (default), Qdrant gRPC (optional, via `VECTOR_STORE=qdrant`), behind `VectorStore` interface
 - **Caching:** go-cache in-memory (MVP), behind `Cache` interface for Redis swap later
 - **Configuration:** Environment variables only (12-factor app, `.env` file for local dev)
-- **Infrastructure:** Docker Compose (Go backend + Qdrant) for local development
+- **Infrastructure:** Docker Compose (Go backend; Qdrant optional via `--profile qdrant`)
 
 ## Build & Run Commands
 
 ```bash
-docker-compose up                    # Start full dev stack (backend + Qdrant)
-go run ./backend/cmd/server/main.go  # Run backend directly
-go test ./backend/...                # Run all backend tests
-go test ./backend/... -cover         # Run tests with coverage
+docker compose up                           # Start backend (in-memory vector store)
+docker compose --profile qdrant up          # Start backend + Qdrant
+go run ./backend/cmd/server/main.go         # Run backend directly
+go test ./backend/...                       # Run all backend tests
+go test ./backend/... -cover                # Run tests with coverage
 ```
 
 Extension: load unpacked in browser (no build step needed).
@@ -40,7 +41,7 @@ Extension: load unpacked in browser (no build step needed).
 Extension click → Content script extracts main content HTML → Background worker POSTs to API
 → Backend: check cache (URL + content hash) → if miss:
   → HTML parse (goquery) → privacy policy detection → structure-aware text chunking (512 tokens target, 64 token overlap within sections)
-  → Embed chunks (OpenAI) → store in Qdrant
+  → Embed chunks (OpenAI) → store in vector store (in-memory default)
   → Single broad retrieval query → top 15-20 chunks
   → LLM analysis (Anthropic Claude) → structured scoring → cache result → return response
 ```
@@ -52,7 +53,7 @@ Extension click → Content script extracts main content HTML → Background wor
 | `api/` | HTTP handlers, CORS middleware, route definitions, JSON response helpers |
 | `extractor/` | HTML parsing (goquery), text chunking, privacy policy detection |
 | `embedding/` | EmbeddingClient interface, OpenAI implementation |
-| `vectorstore/` | VectorStore interface, Qdrant gRPC implementation |
+| `vectorstore/` | VectorStore interface, in-memory + Qdrant gRPC implementations |
 | `rag/` | RAG pipeline orchestration (store + retrieve) |
 | `llm/` | LLMClient interface, Anthropic implementation, prompt templates |
 | `analyzer/` | Full pipeline orchestration, scoring/aggregation, result types |
@@ -74,7 +75,7 @@ Overall score = simple average. Risk levels: Low (8-10), Moderate (5-7.9), High
 
 ```
 POST /api/v1/analyze   # Submit HTML, get analysis back (synchronous)
-GET  /api/v1/health    # Health check (backend + Qdrant status)
+GET  /api/v1/health    # Health check (backend + vector store status)
 ```
 
 ### Extension Structure (`extension/`)