MemRL is a sophisticated, memory-based reinforcement learning crate for autonomous AI agents. Unlike traditional semantic search (RAG), MemRL agents mathematically sort memories by analyzing both context similarity and a temporally-learned utility factor.
To run this project, you will need the following dependencies installed on your system:
- Rust & Cargo: The Rust toolchain.
- Docker and Docker Compose: Used to run the Qdrant vector database locally.
- Ollama: Required for generating embeddings.
A docker-compose.yml is provided to spin up a local instance of the Qdrant vector store.
docker-compose up -dThis will map Qdrant to ports 6333 and 6334, and maintain its state in the ./qdrant_storage directory.
MemRL uses the nomic-embed-text model via Ollama for text embeddings. Make sure your Ollama daemon is running, and pull the required model:
ollama pull nomic-embed-textTo test the MemRL mathematical pipeline interactively, you can run the included Docker-backed simulation example:
cargo run --example value_aware_reranking --features "qdrant ollama"To use MemRL in your own real-time agents, add it to your dependencies. If you don't require external SDKs like qdrant-client or reqwest, disable the default features.
[dependencies]
mem_rl = { version = "0.1", default-features = false }Construct a real-time agent memory system via the MemRLAgentBuilder:
use mem_rl::MemRLAgentBuilder;
let agent = MemRLAgentBuilder::new(your_vector_store)
.learning_rate(0.3)
.utility_balance(0.5)
.recall_pool(50)
.context_window(3)
.build()?;This implementation is based on the concepts formalized in the research paper: MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory. It enables autonomous AI agents to self-evolve by learning from episodic memory using Reinforcement Learning techniques.
The system relies on an Intent-Experience-Utility triplet:
-
Intent (
$z_i$ ): The context or query that triggered the experience. -
Experience (
$e_i$ ): The solution trace or trajectory results. -
Utility (
$Q_{i}$ ): A learned scalar value representing how helpful this memory was in the past.
It employs Two-Phase Retrieval to fetch relevant experiences:
- Phase A (Semantic Similarity): Retrieves candidate memories based strictly on embedding cosine distance.
-
Phase B (Value-Aware Selection): Re-ranks candidates by computing a Z-score normalized composite score that balances semantic similarity and the historical utility (
$Q_{i}$ ) of the memory.
After an experience is utilized and a new reward is observed, the agent updates the utility profile via Monte Carlo Learning:
rather than discarding the memory.
MemRL functions via a learning feedback loop:
- Computes text embeddings for a given request via Ollama.
- Performs a semantic similarity search in Qdrant to retrieve similar past intents.
- Chooses to rely on retrieved experiences or executes a fallback simulation.
- Records the operation's utility factor and adjusts its knowledge representation appropriately on successive queries.
- [2601.03192] MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory. arXiv preprint arXiv:2601.03192 (2026). https://arxiv.org/abs/2601.03192