LLM Reasoning Evaluation Interpretability

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

Wei Xia, Haoqing Wang, Zhi-Hong Deng, Yehui Tang

Published May 25, 2026Featured #8In the daily list May 26, 2026

Open on arXiv Read PDF

Daily score64.9

Editorial review7.2

Relevance0.496

Freshness0.722

Why It Matters

What makes this one worth your time

Understanding when and how to invoke reasoning in LLMs can lead to more efficient and effective AI systems, saving computational resources while improving performance.

EDRM leverages entropy dynamics to optimize reasoning in LLMs, enhancing efficiency and accuracy.

Summary

The paper explores when explicit reasoning in large language models (LLMs) is beneficial by analyzing entropy dynamics during the decoding process. It introduces EDRM, a framework that uses early-stage entropy signals to adaptively select inference strategies, leading to improved accuracy and reduced token consumption across various benchmarks and models.

Key contributions

Introduction of EDRM, a framework for adaptive reasoning based on entropy dynamics.
Demonstration of EDRM's effectiveness across 15 benchmarks and 4 LLMs, achieving significant token savings and accuracy improvements.

Notable insights

Entropy dynamics during decoding can signal when reasoning is beneficial, resembling a phase-transition from exploration to structured reasoning.
EDRM provides a training-free method to adaptively manage reasoning processes, optimizing both token usage and accuracy.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2605.22873v1 Announce Type: cross Abstract: Chain-of-thought (CoT) reasoning has become the default strategy for enhancing LLM capabilities, yet its application raises a fundamental question: when is explicit reasoning actually beneficial? Empirical evidence reveals a striking paradox: CoT often provides marginal or even negative gains on factual and open-ended tasks while multiplying token consumption. In this work, we show that LLM reasoning is not a static property of tasks or models, but a \emph{dynamic decoding state} that emerges during generation. Through systematic analysis, we find early-stage entropy dynamics provide a reliable signal of this state: tasks benefiting from CoT exhibit consistent entropy reduction, while others display unstable or increasing patterns. This behavior can be interpreted as a phase-transition-like shift from a high-entropy exploratory regime to a low-entropy structured reasoning regime. Based on these insights, we propose \textbf{EDRM} (Entropy Dynamics-based Reasoning Manifold), a lightweight and training-free routing framework that leverages early decoding entropy to adaptively select inference strategies. EDRM embeds entropy trajectories into a compact and interpretable manifold representation, enabling both zero-shot deployment and fine-grained instance-level adaptation. Across 15 benchmarks and 4 LLMs of varying scales and architectures, EDRM consistently outperforms static baselines. At the dataset level, EDRM achieves \textbf{41--55\%} token reduction while improving accuracy with as few as 50 calibration samples. At the instance level, it further improves accuracy by up to \textbf{4.7\%} while maintaining \textbf{27--45\%} token savings. These results suggest that reasoning should be invoked selectively rather than by default, and demonstrate the effectiveness of entropy-driven decoding control for efficient and adaptive LLM inference.