MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

Note: This repository is also maintained at jwliao-ai/MemQ. If you find this work useful, please consider starring both repositories!

Episodic memory allows LLM agents to accumulate and retrieve experience, but current methods treat each memory independently—evaluating retrieval quality in isolation without accounting for the dependency chains through which memories enable the creation of future memories.

MemQ applies TD(λ) eligibility traces to memory Q-values, propagating credit backward through a provenance DAG that records which memories were retrieved when each new memory was created. Credit weight decays as (γλ)^d with DAG depth d, replacing temporal distance with structural proximity.

Key Features

No Weight Updates: Frozen LLM backbone — all learning happens through Q-value updates on episodic memories
Provenance DAG: Tracks memory creation dependencies for multi-step credit assignment
TD(λ) Eligibility Traces: Propagates credit backward through memory chains, not just single-step updates
EC-MDP Formalization: Exogenous-Context MDP that factors state into an exogenous task stream and endogenous memory store
Q-Integrated Retrieval: Two-phase retrieval with locality filtering followed by Q-guided ε-greedy selection

Algorithm

Results

MemQ achieves the highest success rate on all six benchmarks in both generalization evaluation and runtime learning evolving.

Generalization Evaluation on Held-out Test Tasks

Runtime Learning Results

Installation

pip install -e .

Or with development dependencies:

pip install -e ".[dev]"

Quick Start

Copy a template config and fill in your API credentials:

cp configs/llb_template.yaml configs/my_experiment.local.yaml
# Edit the file: set api_key, base_url, model, etc.

Run an experiment:

# Lifelong Agent Bench (OS Interaction)
python run/run_llb.py --config configs/my_experiment.local.yaml

# GPQA (Graduate-level QA)
python run/run_gpqa.py --config configs/gpqa_template.yaml

# BFCL (Function Calling)
python run/run_bfcl.py --config configs/bfcl_template.yaml

# ERQA (Embodied Reasoning)
python run/run_erqa.py --config configs/erqa_template.yaml

# MMMU Pro (Multimodal)
python run/run_mmmu.py --config configs/mmmu_template.yaml

# LiveCodeBench
python run/run_lcb.py --config configs/lcb_template.yaml

Algorithm Variants

Configure via experiment.algorithm in YAML configs:

Algorithm	Description
`memq`	MemQ — TD(λ) eligibility traces + per-task Q updates over provenance DAG
`memrl`	MemRL — Single-step Q-value updates per task (γ=0)
`self_rag`	Self-RAG — Adaptive retrieval with self-reflection
`rag`	RAG — Standard similarity-based retrieval
`memp`	MemP — Memory-only proceduralization (no RL)

Architecture

memq/
├── agent/          # LLM agent (frozen backbone)
├── configs/        # Pydantic configuration models
├── providers/      # LLM & Embedding provider interfaces
├── service/        # Core memory service
│   ├── memory_service.py   # Build → Retrieve → Update loop
│   ├── memory_store.py     # In-memory vector store with Q-values
│   ├── value_driven.py     # Q-guided ε-greedy selection
│   ├── builders.py         # Proceduralization / trajectory → memory
│   ├── retrievers.py       # Locality filtering + scoring
│   └── updater.py          # Reflection-based update strategies
├── run/            # Benchmark runners
└── *_eval/         # Benchmark-specific adapters

Core Loop

Retrieve: Locality filter (cosine similarity ≥ θ) → Q-guided ε-greedy top-k selection
Act: Frozen LLM generates trajectory conditioned on retrieved memories
Build: Compress trajectory into new episodic memory (proceduralization or reflection)
Update: Compute TD error, BFS backward through provenance DAG, accumulate ΔQ with (γλ)^d decay

Key Hyperparameters

Parameter	Description	Typical Range
`alpha`	Q-learning rate	0.1–0.5
`gamma`	Discount factor (provenance horizon)	0.3–0.8
`lambd` (λ)	Eligibility trace decay	0.5–0.9
`k_retrieve`	Number of candidates within local consistency	3–10
`topk`	Number of memories forming the context	3–10
`weight_sim`	Similarity weight in hybrid scoring	0.3–1.0
`weight_q`	Q-value weight in hybrid scoring	0.0–0.7
`epsilon`	Exploration probability	0.0–0.1

Benchmarks

Benchmark	Domain	Task Type
Lifelong Agent Bench (LLAB)	OS Interaction / Database	Multi-step interactive
Berkeley Function Calling Leaderboard (BFCL)	Function Calling	Multi-turn API calls
Graduate-Level Google-Proof QA (GPQA)	Science QA	Expert-level reasoning
Embodied Reasoning QA (ERQA)	Embodied Reasoning	Visual + spatial
MMMU Pro	Multimodal Understanding	Image + text
LiveCodeBench (LCB)	Code Generation	Programming

Citation

@misc{liao2026memqintegratingqlearningselfevolving,
      title={MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs}, 
      author={Junwei Liao and Haoting Shi and Ruiwen Zhou and Jiaqian Wang and Shengtao Zhang and Wei Zhang and Weinan Zhang and Ying Wen and Zhiyu Li and Feiyu Xiong and Bo Tang and Muning Wen},
      year={2026},
      eprint={2605.08374},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.08374}, 
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
3rdparty		3rdparty
configs		configs
memq		memq
run		run
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
algorithm_box.png		algorithm_box.png
evaluation_results_on_held_out_test_tasks.png		evaluation_results_on_held_out_test_tasks.png
framework_overview.png		framework_overview.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
runtime_learning_results.png		runtime_learning_results.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

Key Features

Algorithm

Results

Generalization Evaluation on Held-out Test Tasks

Runtime Learning Results

Installation

Quick Start

Algorithm Variants

Architecture

Core Loop

Key Hyperparameters

Benchmarks

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

Key Features

Algorithm

Results

Generalization Evaluation on Held-out Test Tasks

Runtime Learning Results

Installation

Quick Start

Algorithm Variants

Architecture

Core Loop

Key Hyperparameters

Benchmarks

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages