Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning

Yi Wang, Haojie Lu, Zhaofan Zhang, Li Chen, Sihong Xie

Published May 28, 2026

Editorial review7.5

Relevance0.458

Freshness0.000

Why It Matters

What makes this one worth your time

Improving LLMs' spatial reasoning capabilities is crucial for their application in real-world scenarios, particularly in embodied intelligence tasks.

A new method enhances LLM spatial reasoning through hierarchical task decomposition.

Summary

The paper presents a method for hierarchical task decomposition in spatial reasoning for LLMs, addressing their limitations in deriving optimal intermediate states through a novel MCTS-Guided Group Relative Policy Optimization approach.

Key contributions

Introduction of a hierarchical task decomposition method for LLMs in spatial reasoning.
Development of the MCTS-Guided Group Relative Policy Optimization (M-GRPO) framework.
Implementation of a fine-grained advantage function for improved path planning.

Notable insights

The incorporation of the LLM's prior predictive probabilities with epistemic uncertainty in the UCT formula is a novel approach to enhance planning capabilities.
The use of a fine-grained advantage function allows for more effective learning of optimal path planning.

Possible limitations

Potential challenges in generalizing the method across diverse spatial reasoning tasks are not addressed in the abstract.

Abstract

arXiv:2605.28144v1 Announce Type: new Abstract: LLMs have shown remarkable proficiency in general language understanding and reasoning. However, they consistently underperform in spatial reasoning that severely limits their application, particularly in embodied intelligence. Inspired by the success of hierarchical reinforcement learning, this paper introduces a novel method for hierarchical task decomposition in LLM spatial reasoning. Our approach guides LLMs to decompose complex tasks into manageable sub-tasks by identifying key intermediate states and generating simplified sub-environments. However, we identify that LLMs often fail to derive optimal intermediate states due to their insufficient spatial prior, leading to sub-optimal task decomposition. To address this limitation and enhance its planning capability, we propose the MCTS-Guided Group Relative Policy Optimization (M-GRPO), where we reformulate the UCT formula by incorporating the LLM's prior predictive probabilities alongside its epistemic uncertainty. Furthermore, we implement a more fine-grained advantage function, enabling the model to learn optimal path planning. Experimental results demonstrate that our method substantially improves LLM performance on spatial tasks, including navigation, planning, and strategic games, achieving state-of-the-art results. This work paves the way for LLMs in real-world applications.