Trust, but Don't Verify: Epistemic Blind Spots in LLM Source Evaluation

Rohan N. Pradhan, Steve Goley

Published Jul 9, 2026Featured #5In the daily list Jun 7, 2026

Open on arXiv Read PDF

Daily score71.0

Editorial review7.5

Relevance0.463

Freshness0.722

Why It Matters

What makes this one worth your time

Understanding the limitations of language models in evidence evaluation is crucial for improving their reliability in decision-making contexts.

Language models struggle to discern valid statistics from fabricated ones during multi-source synthesis.

Summary

The paper investigates how language models evaluate the quality of evidence during multi-source synthesis, revealing that while they can detect fabricated statistics, they fail to apply this capability in practice, leading to similar evaluations of valid and invalid data.

Key contributions

Demonstrates the capability of LLMs to detect fabricated statistics but highlights their failure to apply this in multi-source contexts.
Introduces the concept of epistemic alignment, distinguishing it from user preference alignment.
Provides mechanistic analyses that reveal how models prioritize stylistic cues over numeric validity.

Notable insights

Models exhibit a behavioral dissociation between capability and deployment, failing to utilize their ability to detect numeric validity during synthesis.
The study identifies a methodology-register gate that influences source evaluation based on stylistic presentation rather than factual accuracy.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2606.05403v2 Announce Type: replace-cross Abstract: Language models increasingly act as epistemic proxies, synthesizing evidence from multiple sources to inform decisions. Whether they evaluate the quality of that evidence, or merely aggregate it based on surface presentation, remains poorly understood. We show that models possess the capability to detect fabricated statistics in isolation but do not recruit this capability during multi-source synthesis, producing similar numeric estimates whether the statistics are fabricated or valid. Specifically, source influence is governed by a methodology-register gate that responds to the distributional register of analytical text but not to numeric validity: for example, statistically impossible confidence intervals receive the same weight as valid ones. The behavioral dissociation replicates across six models from four families (Anthropic Claude, Qwen, OLMo, and OpenAI GPT-5.4) and three professional domains. Mechanistic analyses, including causal tracing, linear probes, and component-level attribution, converge on the same account: the model encodes and causally uses a methodology-register representation that transfers across domains, while numeric-validity signals, decodable in isolation, are suppressed to chance during multi-source synthesis. Prompting-based mitigations, even an oracle checklist naming the exact statistical checks, produce blanket skepticism rather than selective discernment, and the post-training pipelines we examine reinforce the shortcut without building numeric verification. Unlike sycophancy, which tracks user preference, this failure tracks whether a source presents as analytically credible, not whether its claims are consistent. We term this \textit{epistemic alignment}: like preference and safety alignment, the question is not capability but deployment.