Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving

Muhammad Talha Sharif, Abdul Rehman

Published Jun 6, 2026

Editorial review7.2

Relevance0.492

Freshness0.000

Why It Matters

What makes this one worth your time

Improving the reliability of mathematical reasoning in LLMs can enhance their applicability in complex problem-solving tasks, reducing the need for larger models.

A critic-guided multi-agent system enhances mathematical reasoning accuracy in LLMs.

Summary

The paper introduces a critic-guided heterogeneous multi-agent framework to enhance the reliability of mathematical reasoning in large language models by using a generator-validator approach that incorporates feedback for error correction, achieving improved accuracy on the GSM8K benchmark.

Key contributions

Introduction of a critic-based heterogeneous multi-agent framework for mathematical reasoning.
Demonstrated accuracy improvement on the GSM8K benchmark.
Ablation studies showing the importance of the critic-based feedback loop.

Notable insights

The use of a critic-driven adaptive learning system for intermediate feedback improves reasoning accuracy.
Heterogeneous multi-agent collaboration reduces the dependency on large models.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2606.05704v1 Announce Type: new Abstract: Recent Large Language Models (LLMs) have shown impressive reasoning abilities; but they are still susceptible to hallucinations, intermediate reasoning mistakes, and unreliable reasoning results in complex mathematical reasoning problems. In this study, we introduce a critic-based heterogeneous multi-agent approach to improve the dependability of mathematical reasoning. This framework incorporates several LLM agents of different specialties and employs a critic-driven adaptive learning system to assess and guide the reasoning process based on intermediate feedback. The system adopts a generator-validator framework, with the validator not only determining correctness but also offering critiques to guide regeneration of solutions. This allows for adaptive error correction and prevents error cascading. Our experiments on the GSM8K benchmark show that the proposed method achieves up to 13% accuracy improvement over single-shot and non-critic models. Additionally, findings suggest that heterogeneity and critique reduce the need for large models, allowing smaller models to perform on par. Ablation studies reveal the main performance gains are due to the critic-based feedback loop and not model size. In summary, the proposed approach showcases the benefits of combining heterogeneous multi-agent collaboration and critique to obtain reliable and interpretable reasoning systems.