Multi-Paradigm Agent Interaction in Practice:A Systematic Analysis of Generator-Evaluator, ReAct Loop,and Adversarial Evaluation in the buddyMe Framework
Xiaohua Wang, Chao Han, Kai Yu, XiaoLiang Xu, Liang Wang
Why It Matters
What makes this one worth your time
This research provides practical design guidelines for developing stable and reliable multi-paradigm agent systems, which is crucial as LLM agents become more integrated into real-world applications.
A systematic analysis of multi-paradigm agent interactions in the buddyMe framework.
Summary
The paper systematically analyzes three agent interaction paradigms within the buddyMe framework, presenting a five-stage processing pipeline and a six-dimensional evaluation schema, supported by empirical case studies.
Key contributions
- Formalization of a five-stage processing pipeline for agent interaction.
- Development of a six-dimensional evaluation schema with weighted scoring.
- Empirical case studies demonstrating the effectiveness of the proposed paradigms in real-world scenarios.
Notable insights
- The Generator-Evaluator pre-review effectively identifies requirement omissions in complex tasks, highlighting the importance of early-stage validation.
- The ReAct loop's redundancy in tool invocations suggests a need for optimization in task execution strategies.
Possible limitations
- Not stated in the abstract.
Abstract
arXiv:2605.16821v1 Announce Type: new Abstract: The rapid evolution of Large Language Model (LLM) agents has produced diverse interaction paradigms, yet few production systems integrate multiple paradigms within a unified architecture. This paper presents a systematic analysis of three principal agent interaction paradigms, including Multi-Agent Orchestration (Generator-Evaluator), ReAct Tool-Use Loops, and Memory-Augmented Interaction, as implemented in buddyMe, an open-source multi-model agent programming framework. We formalize a five-stage processing pipeline: Requirement Pre-Review -> Task Decomposition -> ReAct Execution -> Real-Execution Verification -> Adversarial Evaluation Discussion, and establish a six-dimensional evaluation schema with weighted scoring. Through four empirical case studies drawn from real-world deployment logs covering museum guide generation, scheduled weather tasks, and comprehensive tour planning, we draw three key conclusions. First, Generator-Evaluator pre-review detects requirement omissions in 20 percent of complex tasks, with 80 percent tasks passing initial inspection. Second, the ReAct loop ensures stable subtask execution but leads to around 30 percent redundant tool invocations. Third, adversarial Evaluator-Defender discussions reach consensus within 2-3 rounds for nearly 70 percent of scenarios, functioning mainly for content refinement rather than logical reversal. We additionally provide three Mermaid-based architectural diagrams and conduct cross-paradigm comparisons with CrewAI, AutoGen, LangGraph, MemGPT and A-Mem across six system dimensions. The research outcomes offer practical design guidelines for constructing stable and reliable multi-paradigm agent systems.