Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents
Anany Kotawala
Why It Matters
What makes this one worth your time
Understanding and mitigating incoherence in multi-component LLM systems is crucial for developing reliable AI agents that can handle complex tasks involving multiple interacting components.
The paper addresses the challenge of ensuring global coherence in multi-component LLM agents.
Summary
The paper investigates the issue of compositional incoherence in multi-component large language model (LLM) agents, where locally coherent components can lead to globally incoherent outcomes. It introduces the concept of compositional residual eps* to measure this incoherence and proposes methods for repairing and monitoring coherence in these systems.
Key contributions
- Formalization of the compositional residual eps* as a measure of incoherence.
- Introduction of a product-structure dichotomy to determine when local coherence is sufficient.
- Development of a hierarchical Boyle-Dykstra projection for deterministic repair of incoherent compositions.
Notable insights
- The use of a Rayleigh-quotient prediction to estimate compositional residuals is an innovative approach.
- A hierarchical Boyle-Dykstra projection is proposed as a deterministic method to repair incoherent compositions.
Possible limitations
- Not stated in the abstract
Abstract
arXiv:2605.30335v1 Announce Type: new Abstract: Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is locally coherent. We formalise this locally coherent, globally incoherent failure via the compositional residual eps*, the L2 distance from the composed quote to the joint coherent polytope, computable at runtime from system output and the declared cross-component coupling constraints. A product-structure dichotomy characterises when local coherence suffices, and a Rayleigh-quotient prediction matches the observed residual within 7% on three of four relation classes. A hierarchical Boyle-Dykstra projection repairs the composition deterministically; an anytime-valid e-process gives sequential coherence monitoring. Across 1,876 ensemble cliques on a four-LLM mid-tier panel (frontier-panel rerun in Section 5.5), eps* > 0 on 33-94% of cliques, translating to +0.115 nats per bet of regret on 1,770 resolved bets under the proportional allocation rule (the gain collapses to +0.006 under bettors that themselves coherentise). Three intuitive LLM-side mitigations(retrieval, partition-aware prompting, aggregator-LLM) each fail or regress.