Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

Yuchen Ling, Shengcheng Yu, Zhenyu Chen, Chunrong Fang

Published Jun 10, 2026Featured #5In the daily list Jun 11, 2026

Open on arXiv Read PDF

Daily score64.4

Editorial review7.2

Relevance0.497

Freshness0.722

Why It Matters

What makes this one worth your time

As LLM agents become more integrated into software systems, understanding their security vulnerabilities and defenses is crucial for safe deployment and operation.

A comprehensive framework for understanding and securing LLM agents against evolving threats.

Summary

The paper synthesizes 247 papers to propose a systems-oriented framework for modeling the security of large language model agents, focusing on threat surfaces, attack families, defenses, and evaluation methods. It highlights the dominance of prompt injection and tool-mediated control-flow hijacking, while identifying emerging concerns like persistent state corruption and multi-agent propagation.

Key contributions

Proposes a lifecycle-based, systems-oriented framework for LLM agent security.
Synthesizes existing literature to identify dominant threat surfaces and attack families.
Evaluates current defenses and highlights their limitations in compositionality and benchmark representation.

Notable insights

Prompt injection and tool-mediated control-flow hijacking are currently the most significant threats to LLM agent security.
Emerging concerns include persistent state corruption and multi-agent propagation, indicating evolving threat landscapes.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2606.10749v1 Announce Type: cross Abstract: Large language model (LLM) agents are rapidly moving from conversational interfaces to software components that plan, invoke tools, maintain memory, and act on external environments. This transition changes the nature of security risk. In agentic settings, failures are no longer limited to unsafe text generation. Untrusted content may redirect control flow, misuse tool privileges, corrupt persistent state, leak sensitive information, or trigger harmful external actions. At the same time, research on LLM agent security is expanding quickly but remains fragmented across attack families, defense layers, application domains, and evaluation settings. This paper synthesizes 247 papers through a lifecycle-based, systems-oriented framework that models agent security around the interaction of information flow, delegated authority, and persistent state. We organize the literature around four questions: how LLM agent security should be modeled, which threat surfaces and attack families dominate, what defenses have been proposed and with what tradeoffs, and how security claims are evaluated. We find that prompt injection and tool-mediated control-flow hijacking still dominate the field, while persistent state corruption and multi-agent propagation are becoming central emerging concerns. We further find that current defenses provide useful building blocks but remain weakly compositional, and that existing benchmarks still underrepresent long-horizon, stateful, and deployment-sensitive risks. We argue that secure LLM agents require explicit trust boundaries, principled privilege control, provenance-aware state management, and evaluation practices aligned with realistic operational settings.