A design specification for response-level influence tracing and operationally reversible memory in agent memory systems.
Two primitives for agent memory that are missing across every major memory system in 2026:
- Response-level influence tracing. Every generated response is bound to the exact memory facts that grounded it, with per-fact grounding scores. Given a response ID, you get back the facts, their ranks, their similarity scores, and whether they actually made it into the final prompt.
- Operationally reversible memory. Every mutation writes a typed entry to an append-only op log. The system exposes
rollback(op_id),query_at(timestamp), anddiff(memory_id, v1, v2)— primitives today's memory vendors don't ship.
These are a layer above the audit trails that systems like Mem0 already expose. The audit trail is a solved problem; the operational layer above it is not.
Three things are simultaneously true about agent memory in production today:
- Memory layers (Mem0, Zep, Letta, Cognee, Supermemory, LangMem) are the default, not the exception.
- Memory edits happen invisibly — most systems use LLM-arbitrated updates that mutate state on every write.
- There is no per-response attribution — you can ask "what's in memory?" but not "which memories caused this response?"
When an agent gives a wrong answer, debugging requires walking through application logs, vector store snapshots, and LLM trace dumps in three separate systems. It should require one API call.
The full argument, with comparison tables, worked examples, and the data model, is in SPEC.md.
Draft v0.1, published as a request for comment. The spec will change in response to implementation experience and community feedback. Substantive disagreements — "this primitive is wrong" or "this signal is misweighted" — are the kind of feedback that improves the document.
A reference implementation (MemArray, Apache 2.0) is in development at memarray/memarray.
Issues and PRs welcome. See CONTRIBUTING.md. Particularly interested in:
- Implementation experience from teams adopting these primitives in their own memory systems
- Counter-examples where the grounding-score formulation breaks down
- Edge cases in
rollbacksemantics (cascading dependencies, multi-tenant federation) - Bi-temporal query patterns from compliance and audit use cases
Apache 2.0 — see LICENSE.
Akhil Sanker — @akhilmedvolt. Building MemArray full-time. Looking for design partners (especially voice-agent teams on LiveKit / Vapi / Retell) and a co-founder with GTM experience.