LLMs Should Not Yet Be Credited with Decision Explanation
Wenshuo Wang
Why It Matters
What makes this one worth your time
Understanding the limitations of LLMs in decision explanation is crucial for researchers and practitioners to avoid misinterpretations of their capabilities and to improve the development of reliable AI systems.
The paper critiques the premature crediting of LLMs with decision explanation capabilities.
Summary
The paper argues against attributing decision explanation capabilities to LLMs, distinguishing between decision prediction, rationale generation, and decision explanation, and proposing a bridge standard for evaluating explanatory claims.
Key contributions
- Clarification of the differences between decision prediction, rationale generation, and decision explanation.
- Proposal of a bridge standard for evaluating explanatory claims of LLMs.
- Introduction of a principle of credit calibration for LLMs based on evidential support.
Notable insights
- The distinction between decision prediction and explanation highlights the need for rigorous validation methods in AI.
- The proposed bridge standard for decision-explanation credit could guide future research in aligning LLM capabilities with human-like reasoning.
Possible limitations
- Not stated in the abstract.
Abstract
arXiv:2605.01164v1 Announce Type: new Abstract: This position paper argues that LLMs should not yet be credited with decision explanation. This matters because recent work increasingly treats accurate behavioral prediction, plausible rationales, and outcome-conditioned reasoning traces as evidence that LLMs explain why people decide as they do, risking a premature redefinition of what counts as explanatory progress in human decision modeling. We first distinguish three claims with different evidential burdens: decision prediction, rationale generation, and decision explanation. We then argue that the evidence most commonly offered for LLM-based decision accounts directly supports the first two claims, and sometimes explanatory hypothesis generation, but does not distinguish decision explanation from prediction-supportive rationalization. Next, we propose a bridge standard for decision-explanation credit: stronger claims should specify explanatory targets, discriminate against weaker rationalizer alternatives, use target-appropriate process- or intervention-sensitive validation, and bound their scope. We then situate this standard against competing views and related literatures, clarifying why it preserves the value of LLMs as predictors, narrators, and hypothesis generators while resisting premature explanatory credit. We conclude with a principle of credit calibration: LLMs should be credited for the strongest claim their evidence warrants, and no stronger; if adopted, this principle can help turn LLMs from persuasive narrators of decisions into more reliable instruments for discovering, testing, and communicating explanations of human behavior.