Back to today's list

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Yu Xia, Zhouhang Xie, Xin Xu, Byungkyu Kang, Prarit Lamba, Xiang Gao, Julian McAuley

Published Jun 3, 2026Featured #7In the daily list Jun 4, 2026
Daily score69.1
Editorial review7.5
Relevance0.456
Freshness0.722

Why It Matters

What makes this one worth your time

This approach addresses the inefficiencies in token usage during reasoning in large language models, providing a mechanism for better control over accuracy and efficiency, which is crucial for practical applications.

ACTS enables efficient and controllable reasoning in large language models through adaptive steering.

Summary

The paper introduces Agentic Chain-of-Thought Steering (ACTS), a method that formulates reasoning steering as a Markov decision process, allowing a controller agent to adaptively guide a frozen reasoner during inference for efficient reasoning.

Key contributions

  • Introduction of the ACTS framework for adaptive reasoning steering in LLMs.
  • Demonstration of budget-aware strategy control that maintains reasoning continuity.
  • Empirical validation showing substantial token savings while matching full-thinking performance.

Notable insights

  • The formulation of reasoning steering as a Markov decision process is a novel approach that could enhance the adaptability of reasoning strategies.
  • The use of synthetic steering trajectories for initializing the controller agent suggests a creative method for training reinforcement learning models in this context.

Possible limitations

  • Not stated in the abstract.

Abstract

arXiv:2606.03965v1 Announce Type: cross Abstract: Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking length by shortening, early-stopping, or compressing traces, leaving how the model thinks implicit. In this paper, we propose Agentic Chain-of-Thought Steering (ACTS), which formulates reasoning steering as a Markov decision process where a controller agent adaptively steers a frozen reasoner during inference. At each step, the controller observes the reasoning trace and remaining thinking budget, then issues a steering action consisting of a reasoning strategy and a steering phrase that initiates the next reasoner step. This enables budget-aware strategy control for efficient reasoning while preserving the reasoner's generation continuity. We initialize the controller agent from our constructed synthetic steering trajectories with multi-budget augmentation, and further optimize it via reinforcement learning with budget-conditioned reward shaping. Experiments across multiple benchmarks show that ACTS matches full-thinking performance with substantial token savings, and enables controllable accuracy-efficiency trade-offs across different reasoners and tasks. The code is available at https://github.com/Andree-9/ACTS.