Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games
Keyang Zhong, Junlin Xie, Hefeng Wu, Haofeng Li, Guanbin Li
Why It Matters
What makes this one worth your time
This paper addresses the challenge of improving VLMs' reasoning abilities in complex, adversarial environments, which is crucial for advancing AI's understanding and interaction in real-world scenarios with incomplete information.
A collaborative multi-agent framework enhances VLM reasoning in murder mystery games with imperfect information.
Summary
The paper introduces a novel collaborative multi-agent framework for generating scripts in murder mystery games, enhancing the reasoning capabilities of vision-language models (VLMs) in scenarios with imperfect and deceptive information. By employing a two-stage training strategy, including chain-of-thought fine-tuning and reinforcement learning, the approach significantly improves narrative reasoning and deception resilience in VLMs.
Key contributions
- Development of a collaborative multi-agent framework for generating role-driven game scripts that enhance VLM reasoning under imperfect information.
Notable insights
- The use of a collaborative multi-agent framework can effectively simulate complex social interactions and deception, improving AI reasoning in uncertain environments.
Possible limitations
- The approach may require significant computational resources for training and may not generalize well to non-game scenarios without further adaptation.
Abstract
arXiv:2604.11741v1 Announce Type: new Abstract: Vision-language models (VLMs) have shown impressive capabilities in perceptual tasks, yet they degrade in complex multi-hop reasoning under multiplayer game settings with imperfect and deceptive information. In this paper, we study a representative multiplayer task, Murder Mystery Games, which require inferring hidden truths based on partial clues provided by roles with different intentions. To address this challenge, we propose a collaborative multi-agent framework for evaluating and synthesizing high-quality, role-driven multiplayer game scripts, enabling fine-grained interaction patterns tailored to character identities (i.e., murderer vs. innocent). Our system generates rich multimodal contexts, including character backstories, visual and textual clues, and multi-hop reasoning chains, through coordinated agent interactions. We design a two-stage agent-monitored training strategy to enhance the reasoning ability of VLMs: (1) chain-of-thought based fine-tuning on curated and synthetic datasets that model uncertainty and deception; (2) GRPO-based reinforcement learning with agent-monitored reward shaping, encouraging the model to develop character-specific reasoning behaviors and effective multimodal multi-hop inference. Extensive experiments demonstrate that our method significantly boosts the performance of VLMs in narrative reasoning, hidden fact extraction, and deception-resilient understanding. Our contributions offer a scalable solution for training and evaluating VLMs under uncertain, adversarial, and socially complex conditions, laying the groundwork for future benchmarks in multimodal multi-hop reasoning under imperfect information.