From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents

Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu, Qingqiang Sun, Zequn Sun, Zhangkai Wu, Manqing Dong, Mingkai Zhang, Xuefei Yin, Yanming Zhu

Published Jun 17, 2026

Open on arXiv Read PDF

Editorial review6.8

Relevance0.477

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding and improving the trustworthiness of LLM agents is crucial for their deployment in real-world applications where accountability and transparency are necessary.

A survey on evidence tracing and execution provenance to enhance trust in LLM agents.

Summary

The paper surveys evidence tracing and execution provenance in large language model-based agents, aiming to establish process-level accountability and trustworthiness. It introduces a taxonomy for understanding trace sources, evidence units, and provenance relations, and reviews methodologies for provenance representation, evidence attribution, and failure diagnosis.

Key contributions

Introduction of a taxonomy for evidence tracing and execution provenance.
Review of key methodological directions for provenance-aware LLM agents.
Discussion of benchmarks, datasets, and open challenges for provenance in agent systems.

Notable insights

The paper proposes a unified framework connecting various aspects of LLM agent behavior such as retrieval grounding and tool-use safety.
It introduces a taxonomy for categorizing trace sources and provenance relations, which could aid in standardizing research in this area.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2606.04990v3 Announce Type: replace-cross Abstract: Large language model (LLM)-based agents are evolving from passive text generators into autonomous systems capable of planning, tool use, retrieval, memory access, environmental interaction, and multi-agent collaboration. These capabilities expand agent autonomy, but also make agent behavior harder to verify, debug, and audit. Final-answer accuracy alone cannot explain how an output was produced, which evidence supported each claim, whether tool calls were justified, how memory influenced later decisions, or where failures originated. This survey examines evidence tracing and execution provenance as foundations for process-level accountability in trustworthy LLM agents. We define execution provenance as the typed graph of an agent execution and evidence tracing as its projection onto evidence-support relations. This perspective connects retrieval grounding, claim support, tool-use safety, memory lineage, observability, debugging, audit, and recovery within a unified framework. We introduce a taxonomy covering trace sources, evidence and execution units, provenance relations, tracing granularity and timing, representation forms, and trust functions. We then review key methodological directions, including provenance representation, evidence attribution, tool-use provenance, runtime guardrails, provenance-bearing memory, observability, and failure diagnosis. Finally, we discuss benchmarks, datasets, metrics, and open challenges for building provenance-aware, auditable, and recoverable agent systems.