COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

Wenkai Shen, Pengyang Zhou, Jiahe Xu, Jiaming Qian, Haozhe He, Zhihao Huang, Chaochao Chen, Xiaolin Zheng

Published Jun 1, 2026Featured #9In the daily list Jun 2, 2026

Open on arXiv Read PDF

Daily score59.9

Editorial review6.8

Relevance0.510

Freshness0.722

Why It Matters

What makes this one worth your time

Ensuring the safety of AI systems, especially those capable of multi-step reasoning, is crucial for preventing harmful outcomes and maintaining trust in AI technologies.

COMPASS enhances safety alignment in LLM-powered search agents through cognitive MCTS-guided process alignment.

Summary

The paper introduces COMPASS, a framework for aligning the safety of LLM-powered search agents by using Cognitive MCTS-Guided Process Alignment. It aims to address safety issues arising from multi-step reasoning and tool use by integrating cognitive tree exploration and introspective step-wise alignment to identify and supervise risky actions.

Key contributions

Introduction of COMPASS, a framework for safety alignment in search agents.
Integration of cognitive tree exploration for efficient synthesis of attack trajectories.
Development of introspective step-wise alignment for isolating risky actions.

Notable insights

The use of cognitive tree exploration to synthesize stealthy attack trajectories is a novel approach to identifying potential safety risks.
Introspective step-wise alignment allows for fine-grained supervision of intermediate actions, potentially improving safety without compromising utility.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2605.30838v1 Announce Type: new Abstract: LLM-powered search agents enable multi-step reasoning and tool use. However, these capabilities introduce retrieval-induced safety degradation, as harmful intents may decompose into seemingly innocuous sub-queries that lead to unsafe outcomes. Existing alignment methods struggle to capture sparse safety signals and fail to supervise diverse violations across multi-step interactions. We propose COMPASS, a Cognitive MCTS-Guided Process Alignment framework designed to achieve robust safety alignment throughout the agent workflow while preserving general utility. COMPASS integrates cognitive tree exploration (CTE) to efficiently synthesize stealthy attack trajectories, and introspective step-wise alignment (ISA) to isolate risky intermediate actions for fine-grained process supervision. Empirical results show that COMPASS achieves a favorable safety-utility trade-off while requiring substantially less training data.