Note: This repository is also maintained at jwliao-ai/MemQ. If you find this work useful, please consider starring both repositories!
Episodic memory allows LLM agents to accumulate and retrieve experience, but current methods treat each memory independently—evaluating retrieval quality in isolation without accounting for the dependency chains through which memories enable the creation of future memories.
MemQ applies TD(λ) eligibility traces to memory Q-values, propagating credit backward through a provenance DAG that records which memories were retrieved when each new memory was created. Credit weight decays as (γλ)^d with DAG depth d, replacing temporal distance with structural proximity.
- No Weight Updates: Frozen LLM backbone — all learning happens through Q-value updates on episodic memories
- Provenance DAG: Tracks memory creation dependencies for multi-step credit assignment
- TD(λ) Eligibility Traces: Propagates credit backward through memory chains, not just single-step updates
- EC-MDP Formalization: Exogenous-Context MDP that factors state into an exogenous task stream and endogenous memory store
- Q-Integrated Retrieval: Two-phase retrieval with locality filtering followed by Q-guided ε-greedy selection
MemQ achieves the highest success rate on all six benchmarks in both generalization evaluation and runtime learning evolving.
pip install -e .Or with development dependencies:
pip install -e ".[dev]"- Copy a template config and fill in your API credentials:
cp configs/llb_template.yaml configs/my_experiment.local.yaml
# Edit the file: set api_key, base_url, model, etc.- Run an experiment:
# Lifelong Agent Bench (OS Interaction)
python run/run_llb.py --config configs/my_experiment.local.yaml
# GPQA (Graduate-level QA)
python run/run_gpqa.py --config configs/gpqa_template.yaml
# BFCL (Function Calling)
python run/run_bfcl.py --config configs/bfcl_template.yaml
# ERQA (Embodied Reasoning)
python run/run_erqa.py --config configs/erqa_template.yaml
# MMMU Pro (Multimodal)
python run/run_mmmu.py --config configs/mmmu_template.yaml
# LiveCodeBench
python run/run_lcb.py --config configs/lcb_template.yamlConfigure via experiment.algorithm in YAML configs:
| Algorithm | Description |
|---|---|
memq |
MemQ — TD(λ) eligibility traces + per-task Q updates over provenance DAG |
memrl |
MemRL — Single-step Q-value updates per task (γ=0) |
self_rag |
Self-RAG — Adaptive retrieval with self-reflection |
rag |
RAG — Standard similarity-based retrieval |
memp |
MemP — Memory-only proceduralization (no RL) |
memq/
├── agent/ # LLM agent (frozen backbone)
├── configs/ # Pydantic configuration models
├── providers/ # LLM & Embedding provider interfaces
├── service/ # Core memory service
│ ├── memory_service.py # Build → Retrieve → Update loop
│ ├── memory_store.py # In-memory vector store with Q-values
│ ├── value_driven.py # Q-guided ε-greedy selection
│ ├── builders.py # Proceduralization / trajectory → memory
│ ├── retrievers.py # Locality filtering + scoring
│ └── updater.py # Reflection-based update strategies
├── run/ # Benchmark runners
└── *_eval/ # Benchmark-specific adapters
- Retrieve: Locality filter (cosine similarity ≥ θ) → Q-guided ε-greedy top-k selection
- Act: Frozen LLM generates trajectory conditioned on retrieved memories
- Build: Compress trajectory into new episodic memory (proceduralization or reflection)
- Update: Compute TD error, BFS backward through provenance DAG, accumulate ΔQ with (γλ)^d decay
| Parameter | Description | Typical Range |
|---|---|---|
alpha |
Q-learning rate | 0.1–0.5 |
gamma |
Discount factor (provenance horizon) | 0.3–0.8 |
lambd (λ) |
Eligibility trace decay | 0.5–0.9 |
k_retrieve |
Number of candidates within local consistency | 3–10 |
topk |
Number of memories forming the context | 3–10 |
weight_sim |
Similarity weight in hybrid scoring | 0.3–1.0 |
weight_q |
Q-value weight in hybrid scoring | 0.0–0.7 |
epsilon |
Exploration probability | 0.0–0.1 |
| Benchmark | Domain | Task Type |
|---|---|---|
| Lifelong Agent Bench (LLAB) | OS Interaction / Database | Multi-step interactive |
| Berkeley Function Calling Leaderboard (BFCL) | Function Calling | Multi-turn API calls |
| Graduate-Level Google-Proof QA (GPQA) | Science QA | Expert-level reasoning |
| Embodied Reasoning QA (ERQA) | Embodied Reasoning | Visual + spatial |
| MMMU Pro | Multimodal Understanding | Image + text |
| LiveCodeBench (LCB) | Code Generation | Programming |
@misc{liao2026memqintegratingqlearningselfevolving,
title={MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs},
author={Junwei Liao and Haoting Shi and Ruiwen Zhou and Jiaqian Wang and Shengtao Zhang and Wei Zhang and Weinan Zhang and Ying Wen and Zhiyu Li and Feiyu Xiong and Bo Tang and Muning Wen},
year={2026},
eprint={2605.08374},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.08374},
}This project is licensed under the MIT License - see the LICENSE file for details.



