Resource-Aware LLM Reasoning for Mobile Edge General Intelligence

Mingyi Luo, Ruichen Zhang, Xiangwang Hou, Jun Du, Chunxiao Jiang, Yong Ren, Shiwen Mao

Published Jun 11, 2026

Editorial review6.8

Relevance0.468

Freshness0.000

Why It Matters

What makes this one worth your time

This work is relevant for AI engineers and researchers interested in deploying sophisticated AI reasoning capabilities on edge devices, where computational resources are limited.

Efficiently deploys LLM reasoning on resource-constrained mobile edge devices.

Summary

The paper proposes a joint optimization framework for deploying large language model reasoning in mobile edge environments, addressing resource constraints through adaptive CoT prompting and a distributed MoE architecture.

Key contributions

Proposes a joint optimization framework for LLM reasoning in edge environments.
Introduces a distributed framework combining adaptive CoT prompting with a distributed MoE architecture.
Demonstrates practical viability with experimental evaluations showing high accuracy and latency satisfaction rates.

Notable insights

Modeling reasoning depth as a dynamic network resource variable is a clever approach to optimize resource usage.
The use of adaptive CoT prompting combined with a distributed MoE architecture is an innovative method for scaling LLM reasoning on edge devices.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2509.23248v3 Announce Type: replace Abstract: The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the development of Mobile Edge General Intelligence (MEGI), which brings real-time, privacy-preserving reasoning to the network edge. However, deploying LLM-based agentic AI reasoning in MEGI environments poses significant challenges due to the high computational demands of reasoning and the limited resources of edge devices. To address these challenges, we propose a joint optimization framework for efficient LLM reasoning deployment in MEGI. First, we systematically review enhancement methods to identify mechanisms suitable for edge adaptation. Subsequently, we present a distributed framework that synergizes reasoning enhancement via adaptive CoT prompting with scalable deployment through a distributed MoE architecture. An important innovation of this approach involves modeling reasoning depth as a dynamic network resource variable, which is optimized jointly with expert activation and transmission power. This mechanism allows the system to dynamically regulate expert networks and reasoning complexity according to task requirements and device capabilities. Experimental evaluations in mobile edge environments demonstrate that the proposed framework effectively balances reasoning quality and resource efficiency. The results show that with less than one second of additional inference time, both accuracy and latency satisfaction rate can reach 90\%, validating the practical viability of deploying sophisticated LLM reasoning in resource-constrained MEGI systems.