feat: cherry-pick scrub-secrets, session-diversify, bench from #21 by arreyder · Pull Request #29 · arreyder/solr-mem

arreyder · 2026-05-30T16:08:57Z

Cherry-picks three self-contained, schema-free improvements from the stale stacked PR #21 (feature/session-diversified-ranking, opened 2026-04-18). The rest of #21 (content-hash dedup, hybrid retrieval/embeddings) is deferred — those need schema migrations / embedding infra.

Included (original commits, unchanged — cherry-picked cleanly onto current main)

4ebb8da cap results per session — diversifyBySession helper caps per-session hits in ranked lists; broker packet builder enforces cap=2; search_memories gains session_cap (default 3, 0 disables). Stops one chatty session from dominating results.
a5a9fb5 scrub secrets before storage — new internal/privacy package redacts AWS/GitHub/Anthropic/OpenAI/Slack tokens, bearer tokens, RSA/EC keys, creds-in-URLs, and <private>/<secret> blocks → [REDACTED:<kind>]. Tally recorded in doc metadata (scrub_count/scrub_kinds). Wired into all write paths (store/bulk/update/broker). Default on, opt-out via SOLR_MEM_PRIVACY_SCRUB=off. Defensive win for a shared memory store.
d3f92b9 retrieval benchmark harness — cmd/solr-mem-bench seeds a namespaced bench-* corpus (safe against live data) and reports R@K + MRR; make bench. Lets us measure ranking changes.

No schema changes

Scrub writes to the existing metadata field; diversify uses session_id; bench is standalone. So this deploys with just a server rebuild + restart — no configset reload.

Tests

go build/vet ./... clean; full suite passes incl. new internal/privacy and cmd/solr-mem-bench tests.

🤖 Generated with Claude Code

Adds a generic session-diversify helper and wires it into both the broker packet builder (hardcoded cap=2) and search_memories (session_cap arg, default 3, 0 disables). Prevents one chatty session from starving other relevant context from top-K results. Closes #13. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds internal/privacy.Scrub with patterns for AWS keys, GitHub tokens, Anthropic/OpenAI keys, Slack tokens, bearer tokens, RSA/EC private-key blocks, URLs with embedded credentials, and <private>/<secret> tag blocks. Matches are replaced with [REDACTED:<kind>] markers and the tally is merged into the memory's metadata as scrub_count/scrub_kinds. Wired into store_memory, bulk_store_memories, update_memory, and observe_work. Can be disabled with SOLR_MEM_PRIVACY_SCRUB=off for trusted corpora. Closes #15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New binary at cmd/solr-mem-bench that seeds a namespaced bench-* corpus into any memories collection (safe to run against a live one — only touches bench-* IDs), runs a shipped query set with gold labels, and reports R@1/R@3/R@5/R@10 and MRR plus a per-query breakdown as Markdown. Ships 30-doc synthetic corpus + 25 queries covering easy keyword lookups and harder semantic paraphrases. The paraphrase queries are where BM25 is expected to struggle, giving us a baseline to measure hybrid retrieval (#16 + #17) against. New Makefile target: make bench. 11 unit tests cover R@K, MRR, and Aggregate; no live Solr required for the test suite. Closes #20. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

arreyder and others added 3 commits May 30, 2026 11:00

arreyder merged commit f3b5057 into main May 30, 2026

arreyder deleted the feat/agentmem-picks branch May 30, 2026 16:12

arreyder mentioned this pull request May 30, 2026

agentmemory-inspired improvements: diversify, scrub, dedup, bench, vectors+RRF #21

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cherry-pick scrub-secrets, session-diversify, bench from #21#29

feat: cherry-pick scrub-secrets, session-diversify, bench from #21#29
arreyder merged 3 commits into
mainfrom
feat/agentmem-picks

arreyder commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arreyder commented May 30, 2026

Included (original commits, unchanged — cherry-picked cleanly onto current main)

No schema changes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant