Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models
Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Yuxiang Zhou, Maria Liakata, Nikolaos Aletras
Why It Matters
What makes this one worth your time
Understanding and improving reasoning controllability in LLMs can enhance their reliability and adaptability in diverse applications, making them more useful in real-world scenarios.
The study explores how LLMs handle reasoning conflicts and suggests methods to improve instruction compliance.
Summary
The paper investigates reasoning controllability in large language models (LLMs) by examining reasoning conflicts between parametric and contextual information. It finds that LLMs prioritize task-appropriate reasoning over compliance with conflicting instructions, and demonstrates that reasoning types are encoded in model layers, allowing for increased instruction following through activation-level interventions.
Key contributions
- Systematic investigation of reasoning conflicts in LLMs.
- Demonstration of activation-level interventions to increase instruction compliance.
- Identification of reasoning types encoded in model layers.
Notable insights
- Reasoning types are linearly encoded from middle-to-late layers, enabling potential activation-level controllability.
- LLMs maintain high performance despite conflicting reasoning patterns, indicating reliance on internalized parametric memory.
Possible limitations
- Not stated in the abstract
Abstract
arXiv:2604.27251v1 Announce Type: cross Abstract: Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific problem instances remains a critical challenge for model controllability, and for shedding light on reasoning controllability. In this paper, we present the first systematic investigation of this problem through the lens of reasoning conflicts: an explicit tension between parametric and contextual information induced by mandating logical schemata that deviate from those expected for a target task. Our evaluation reveals that LLMs consistently prioritize sensibility over compliance, favoring task-appropriate reasoning patterns despite conflicting instructions. Notably, task accuracy is not strictly determined by sensibility, with models often maintaining high performance even when using conflicting patterns, suggesting a reliance on internalized parametric memory that increases with model size. We further demonstrate that reasoning conflicts are internally detectable, as confidence scores significantly drop during conflicting episodes. Probing experiments confirm that reasoning types are linearly encoded from middle-to-late layers, indicating the potential for activation-level controllability. Leveraging these insights, we steer models towards compliance, increasing instruction following by up to 29%. Overall, our findings establish that while LLM reasoning is anchored to concrete instances, active mechanistic interventions can effectively decouple logical schemata from data, offering a path toward improved controllability, faithfulness, and generalizability.