Streaming Communication in Multi-Agent Reasoning

Zhen Yang, Xiaogang Xu, Wen Wang, Cong Chen, Xander Xu, Ying-Cong Chen

Published Jun 5, 2026Featured #3In the daily list Jun 6, 2026

Open on arXiv Read PDF

Daily score72.5

Editorial review7.5

Relevance0.464

Freshness0.722

Why It Matters

What makes this one worth your time

This work addresses critical efficiency and effectiveness challenges in multi-agent systems, making it relevant for researchers and engineers focused on optimizing AI communication and reasoning.

StreamMA enhances multi-agent reasoning by streaming steps to reduce latency and improve effectiveness.

Summary

The paper presents StreamMA, a multi-agent reasoning system that reduces latency and improves effectiveness by streaming reasoning steps to downstream agents, formalizing these advantages and demonstrating performance improvements across various benchmarks.

Key contributions

Introduction of StreamMA, a novel multi-agent reasoning system that streams reasoning steps.
Formal analysis of the advantages of streaming versus traditional protocols.
Empirical validation showing significant performance improvements across multiple benchmarks.

Notable insights

The pipelining of reasoning steps allows for early reliable outputs to guide downstream agents, mitigating the impact of error-prone later steps.
The discovery of a 'step-level scaling law' introduces a new dimension for improving multi-agent system performance.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2606.05158v1 Announce Type: cross Abstract: Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a "step-level scaling law": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.