Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

Wenbo Zhang, Lijinghua Zhang, Liner Xiang, Hengrui Cai

Published May 13, 2026

Editorial review7.0

Relevance0.487

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding when and how to leverage reasoning in LLMs can lead to more efficient and accurate automated decision-making systems, which is crucial for applications requiring structured verification.

RACER optimizes the use of reasoning in LLMs for cost-effective and accurate automated judgments.

Summary

The paper investigates the cost and accuracy trade-offs of using reasoning-capable large language models as automated judges, proposing a method called Robust Adaptive Cost-Efficient Routing (RACER) to dynamically select between reasoning and non-reasoning models under budget constraints.

Key contributions

Proposes RACER for adaptive selection between reasoning and non-reasoning LLMs.
Formulates the routing problem as a constrained distributionally robust optimization problem.
Provides theoretical guarantees for the RACER method, including uniqueness and convergence.

Notable insights

Reasoning in LLMs is not universally beneficial and incurs higher computational costs.
RACER uses a KL-divergence uncertainty set to handle distribution shifts effectively.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2605.10805v1 Announce Type: new Abstract: Reasoning-capable large language models (LLMs) have recently been adopted as automated judges, but their benefits and costs in LLM-as-a-Judge settings remain unclear. Through controlled comparisons between reasoning and non-reasoning judges, we show that explicit reasoning substantially improves judgment accuracy on tasks requiring structured verification (e.g., math and coding), while offering limited or even negative gains on simpler evaluations and incurring significantly higher computational cost. These findings motivate that reasoning should be used selectively rather than universally, with awareness of possible distribution shift. We propose a Robust Adaptive Cost-Efficient Routing (RACER), which dynamically selects between reasoning and non-reasoning judges under a fixed budget by formulating routing as a constrained distributionally robust optimization problem. RACER explicitly accounts for distribution shift via a KL-divergence uncertainty set, admits an efficient primal--dual algorithm, and enjoys theoretical guarantees including uniqueness of the optimal policy and linear convergence. Extensive experiments show that RACER achieves superior accuracy--cost trade-offs under distribution shift.