Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

Yasmine Omri, Ziyu Gan, Zachary Broveak, Robin Geens, Zexue He, Alex Pentland, Marian Verhelst, Tsachy Weissman, Thierry Tambe

Published Jun 6, 2026

Open on arXiv Read PDF

Editorial review6.8

Relevance0.473

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding and optimizing agent memory systems is crucial for improving the efficiency and scalability of long-horizon AI tasks.

The paper characterizes agent memory systems and offers practical recommendations for their optimization.

Summary

The paper provides a systems characterization of agent memory in long-horizon tasks, introducing a taxonomy for classifying memory systems, a profiling harness for cost attribution, and an evaluation of ten systems across benchmarks, leading to ten system recommendations.

Key contributions

Introduction of a system-oriented taxonomy for agent memory systems.
Development of a phase-aware profiling harness for cost attribution.
Characterization of ten representative agent memory systems and derivation of system recommendations.

Notable insights

The use of a phase-aware profiling harness to attribute costs to different memory operations.
The identification of design choices that affect cost distribution in memory systems.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2606.06448v1 Announce Type: new Abstract: LLM agents are increasingly deployed on long-horizon tasks requiring sustained reasoning over extended interaction histories. Realizing this at scale requires agents to persistently store, retrieve, and update their own memory across sessions. A rich ecosystem of agent memory systems has emerged spanning flat retrieval, LLM-mediated extraction, consolidating fact stores, and agentic control flows. Yet, their system-level behavior remains uncharacterized. We present the first systems characterization of agent memory. First, we introduce a system-oriented taxonomy classifying agent memory systems along four axes. Second, we build a phase-aware profiling harness attributing cost to construction, retrieval, and generation. Third, we characterize ten representative systems across two benchmark suites, uncovering how design choices shift cost across the write and read paths. Finally, we derive 10 system recommendations covering construction scheduling, capability floors, amortization via query volume, freshness-latency tradeoffs, and fleet-scale management.