Integrated and Cross-Architecture Interpretation of LLM Reasoning
Leonardo Matthew Yauw, Wei-Bin Kou, Yujiu Yang
Why It Matters
What makes this one worth your time
Understanding LLM reasoning is crucial for improving model transparency and trustworthiness, which are essential for real-world applications.
A novel framework for interpreting LLM reasoning across architectures.
Summary
The paper introduces the Integrated, cross-Architecture Reasoning (IAR) framework to enhance the interpretability of LLM reasoning by analyzing reasoning-crucial tokens across different model architectures and layers.
Key contributions
- Introduction of the IAR framework for LLM reasoning interpretability.
- Development of a method combining MIP and Tukey IQR for token isolation.
- Application of a Jaccard stability metric to validate reasoning quality across domains.
Notable insights
- The use of bandwidth-calibrated MIP and Tukey IQR peak-detection offers a refined method for isolating important tokens.
- Overlap analysis between MIP and DTR tokens provides insights into the evolution of reasoning patterns across layers.
Possible limitations
- Not stated in the abstract.
Abstract
arXiv:2605.28006v1 Announce Type: cross Abstract: Understanding how LLMs reason is hindered by a practical asymmetry: while their generated outputs are observable, the underlying reasoning patterns remain opaque. Relying on single probes, such as Mutual Information Peak (MIP) or Deep-Thinking Ratio (DTR), risks underestimating the genuine inferential structure. To response this deficiency, we present an Integrated, cross-Architecture Reasoning (IAR) framework, designed to provide a unified approach to LLM reasoning interpretability. Specifically, we first propose to use bandwidth-calibrated MIP coupled with Tukey IQR peak-detection to isolate reasoning-crucial tokens at the output layer. Second, we performed an overlap analysis between MIP-picked tokens and DTR-deep tokens to trace the cross-layer trajectories of those tokens. This also discloses whether reasoning-crucial tokens are computation-intensive as well, further facilitating to understand how reasoning patterns evolve across model layers. Finally, we apply a Jaccard stability metric over multi-domain problems to verify if the MIP-identified tokens are reasoning quality-guaranteed. Extensive experiments on three models (Qwen-7B, Qwen-14B, and Llama-8B) across four domains (mathematics, code, logic, and common sense) demonstrate IAR's generalizable interpretation capabilities across architectures.