Long-Term Memory Agent

This repository contains the course project implementation for a long-term memory dialogue agent. It includes baseline agents, an append-only memory agent, an update/merge memory agent, and an update + reflection agent.

Environment

Create an isolated conda environment:

mamba env create -f environment.yml
conda activate memory-agent

If the environment already exists, update it with:

mamba env update -f environment.yml --prune
conda activate memory-agent

Prepare a Small Eval Set

Start with a small sample before running the full evaluation:

cd eval_kit
python prepare_eval_set.py --output eval_set_small.json --per_category 2 --seed 42
cd ..

Run the No-Memory Baseline

The no-memory baseline answers only from the question itself and is intended as the required control group.

First configure an OpenAI-compatible chat endpoint. For a local vLLM server, the defaults in eval_kit/llm_client.py are enough:

export LLM_BASE_URL="http://localhost:8000/v1"
export LLM_API_KEY="EMPTY"
export LLM_MODEL="Qwen/Qwen2.5-3B-Instruct-AWQ"

On macOS, the simplest local setup is Ollama:

brew services start ollama
ollama pull qwen2.5:3b
source env.local-ollama.sh

Then run generation:

python eval_kit/run_generation.py \
  --eval_set eval_kit/eval_set_small.json \
  --agent memory_agent.agent.baselines:NoMemoryAgent \
  --output experiments/results/predictions_nomem_small.json

If you do not want to activate the environment in the current shell, use:

conda run -n memory-agent python eval_kit/run_generation.py \
  --eval_set eval_kit/eval_set_small.json \
  --agent memory_agent.agent.baselines:NoMemoryAgent \
  --output experiments/results/predictions_nomem_small.json

For cloud APIs, set LLM_BASE_URL, LLM_API_KEY, and LLM_MODEL to the provider's OpenAI-compatible endpoint before running the same command.

Run the Main Memory Agent

The final experimental agent is:

memory_agent.agent.controller:UpdateReflectionAgent

It uses extracted memories, high-signal detail notes, rule-based memory deduplication/merge, and reflection memories.

For a quick small-set run:

MEMORY_PER_SESSION=4 \
MEMORY_TOP_K=10 \
MEMORY_DETAIL_NOTES=1 \
MEMORY_DETAIL_NOTES_PER_SESSION=3 \
MEMORY_UPDATE_USE_LLM=0 \
MEMORY_REFLECTIONS=6 \
MEMORY_REFLECTION_INPUTS=120 \
MEMORY_ALLOW_INFERENCE=0 \
python eval_kit/run_generation.py \
  --eval_set eval_kit/eval_set_small.json \
  --agent memory_agent.agent.controller:UpdateReflectionAgent \
  --output experiments/results/predictions_update_reflection_small.json \
  --resume

MEMORY_ALLOW_INFERENCE=1 is an optional prompt ablation that lets the model make cautious "likely/probably" inferences from indirect memories. Current small-set results show fewer unknown answers but slightly lower rough F1, so the main reported p10 configuration keeps it disabled.

For the p10 experiment used in the current result summary:

bash experiments/run_p10_experiments.sh

The generated p10 files include:

experiments/results/predictions_nomem_p10.json
experiments/results/predictions_fullctx_p10.json
experiments/results/predictions_rag_p10.json
experiments/results/predictions_update_reflection_p10.json

Judge Evaluation

Local judge results with qwen2.5:3b are included as development references. For official-style judging, configure a stronger OpenAI-compatible endpoint such as DeepSeek or DashScope:

export LLM_BASE_URL="https://api.deepseek.com/v1"
export LLM_API_KEY="sk-..."
export LLM_MODEL="deepseek-v4-flash"

bash experiments/run_p10_cloud_judge.sh

Result Summary

The current experiment record is:

experiments/results/small_experiment_summary.md

The machine-readable result manifest is:

experiments/results/experiment_manifest.json

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
eval_kit		eval_kit
experiments		experiments
memory_agent		memory_agent
paper		paper
.gitignore		.gitignore
Agent_Memory.md		Agent_Memory.md
Proposal_231880380_计鑫楷-2.pdf		Proposal_231880380_计鑫楷-2.pdf
README.md		README.md
TEAM_SUMMARY.md		TEAM_SUMMARY.md
env.local-ollama.sh		env.local-ollama.sh
environment.yml		environment.yml
milestone_report.md		milestone_report.md
团队展示_项目总结.md		团队展示_项目总结.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Long-Term Memory Agent

Environment

Prepare a Small Eval Set

Run the No-Memory Baseline

Run the Main Memory Agent

Judge Evaluation

Result Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Long-Term Memory Agent

Environment

Prepare a Small Eval Set

Run the No-Memory Baseline

Run the Main Memory Agent

Judge Evaluation

Result Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages