Understanding Conversational Patterns in Multi-agent Programming: A Case Study on Fibonacci Game Development

Srijita Basu, Viktor Kjellberg, Simin Sun, Bengt Haraldsson, Md. Abu Ahammed Babu, Wilhelm Meding, Farnaz Fotrousi, Miroslaw Staron

Published May 26, 2026

Open on arXiv Read PDF

Editorial review6.8

Relevance0.473

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding how LLM-based agents can effectively collaborate in software engineering tasks is crucial for developing autonomous systems that can reliably produce correct solutions.

The study explores how LLM-based agents coordinate in software engineering tasks, revealing insights into their conversational dynamics.

Summary

The paper conducts a systematic analysis of conversational patterns in multi-agent programming using LLMs, focusing on role-oriented collaboration in software engineering. It evaluates 12 model combinations from 7 open-source LLMs to understand efficiency, consistency, and effectiveness in agent interactions.

Key contributions

Systematic analysis of multi-agent interactions using LLMs in software engineering.
Identification of key dimensions of multi-agent interaction: efficiency, consistency, and effectiveness.
Empirical evaluation of 12 model combinations from 7 open-source LLMs.

Notable insights

The DeepSeek-R1:DeepSeek-R1 pair uniquely converged to the correct solution from the first iteration and maintained it.
Role alignment was strong in some pairs even when they diverged from the correct solution.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2605.24138v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly applied to software engineering (SE), yet their potential for autonomous, role-oriented collaboration remains largely underexplored. Understanding how multiple LLM-based agents coordinate, maintain role alignment, and converge on solutions is critical for SE, as naively allowing agents to interact does not reliably lead to correct or stable outcomes. Recent empirical studies show that unstructured or poorly understood interaction dynamics can result in error propagation, premature consensus on incorrect solutions, or prolonged disagreement that prevents convergence, even when correct partial solutions are present early in the interaction. As an initial step towards addressing this underexplored area, we undertake a systematic analysis of conversations between two agents, a Designer and a Programmer across 12 model combinations from 7 open-source LLMs (Gemma 2, Gemma 3, LLaMA 3.2, LLaMA 3.3, DeepSeek-R1, MiniCPM, and Qwen3). Our systematic approach reveals three key dimensions of multi-agent interaction: efficiency (the speed and stability of convergence), consistency (the degree of role alignment visualized by BLEU and ROUGE), and effectiveness (the extent of compilation success and error resolution). Results show that the DeepSeek-R1:DeepSeek-R1 pair was unique in converging to the correct solution from the very first iteration and sustaining it consistently to the final iteration, while LLaMA 3.2:LLaMA 3.2 and Qwen3:Qwen3 demonstrated strong Designer:Programmer role alignment despite of diverging from the correct solution. The other pairs deviated from the task, never to converge to a result. These findings advance understanding of agentic programming and highlight the need for further research on understanding and calibrating convergence and stop conditions essential for future autonomous SE.