Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu, Dianzhi Yu, Shicheng Ma, Wenqian Cui, Yiyang Zhao, Yiyi Chen, Ruoxi Jiang, Irwin King, Zenglin Xu

Published May 26, 2026

Open on arXiv Read PDF

Editorial review6.8

Relevance0.521

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding and improving the trustworthiness of agentic AI systems is crucial for their deployment in high-risk environments.

A survey on enhancing trustworthiness in agentic AI systems through safety, robustness, privacy, and security.

Summary

The paper surveys the trustworthiness of agentic AI systems, focusing on safety, robustness, privacy, and system security, and provides a unified framework for evaluating these aspects through metrics and benchmarks.

Key contributions

A comprehensive survey of safety, robustness, privacy, and system security in agentic AI.
Introduction of a unified metrics-and-benchmarks hub for evaluating trustworthiness.
Case study of real-world security failures in open-source agentic systems.

Notable insights

The paper consolidates evaluation into a unified metrics-and-benchmarks hub, offering scenario-to-metric guidance for release gating.
It highlights the importance of addressing self-evolving agents and runtime monitoring as open challenges.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2605.23989v1 Announce Type: new Abstract: Agentic AI systems -- Large Language Models (LLMs) augmented with planning, tool use, memory, and long-horizon interactions -- can execute complex tasks autonomously, but their multi-step trajectories introduce new failure modes that challenge trustworthiness. This survey provides a focused examination of trustworthy agentic AI through two core dimensions that are critical for high-risk deployments: Safety and Robustness, and Privacy and System Security. For each dimension, we clarify key concepts, identify where risks emerge along the agent workflow, and summarize stage-targeted mitigation strategies. Other trustworthiness aspects (value alignment, transparency, fairness, and accountability) are discussed as relevant context rather than parallel chapters. To support consistent comparison and deployment decisions, we consolidate evaluation into a unified metrics-and-benchmarks hub, emphasizing both outcome and process signals (e.g., constraint violations, trace completeness, and adversarial success rates) and offering scenario-to-metric guidance for release gating. We conclude by outlining open challenges such as self-evolving agents, runtime monitoring and verification, privacy-preserving personalization, and the trust-utility trade-off, and present a case study of real-world security failures in open-source agentic systems. Our goal is to serve as a practical reference for researchers and practitioners building trustworthy agentic systems in high-stakes environments.