Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games
Keyang Zhong, Junlin Xie, Hefeng Wu, Haofeng Li, Guanbin Li
Feedback
Why It Matters
This paper addresses the challenge of improving VLMs' reasoning abilities in complex, adversarial environments, which is crucial for advancing AI's understanding and interaction in real-world scenarios with incomplete information.
Contributions
- Development of a collaborative multi-agent framework for generating role-driven game scripts that enhance VLM reasoning under imperfect information.
Insights
- The use of a collaborative multi-agent framework can effectively simulate complex social interactions and deception, improving AI reasoning in uncertain environments.
Limitations
- The approach may require significant computational resources for training and may not generalize well to non-game scenarios without further adaptation.
Tags
- agent
- benchmark
- evaluation
- multimodal
- reasoning
- vision_language
Abstract
arXiv:2604.11741v1 Announce Type: new Abstract: Vision-language models (VLMs) have shown impressive capabilities in perceptual tasks, yet they degrade in complex multi-hop reasoning under multiplayer game settings with imperfect and deceptive information. In this paper, we study a representative multiplayer task, Murder Mystery Games, which require inferring hidden truths based on partial clues provided by roles with different intentions. To address this challenge, we propose a collaborative multi-agent framework for evaluating and synthesizing high-quality, role-driven multiplayer game scripts, enabling fine-grained interaction patterns tailored to character identities (i.e., murderer vs. innocent). Our system generates rich multimodal contexts, including character backstories, visual and textual clues, and multi-hop reasoning chains, through coordinated agent interactions. We design a two-stage agent-monitored training strategy to enhance the reasoning ability of VLMs: (1) chain-of-thought based fine-tuning on curated and synthetic datasets that model uncertainty and deception; (2) GRPO-based reinforcement learning with agent-monitored reward shaping, encouraging the model to develop character-specific reasoning behaviors and effective multimodal multi-hop inference. Extensive experiments demonstrate that our method significantly boosts the performance of VLMs in narrative reasoning, hidden fact extraction, and deception-resilient understanding. Our contributions offer a scalable solution for training and evaluating VLMs under uncertain, adversarial, and socially complex conditions, laying the groundwork for future benchmarks in multimodal multi-hop reasoning under imperfect information.