BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Changgan Yin, Minghui Li, Lulu Xue, Yichen Wang, Shengshan Hu, Aishan Liu, Peijin Guo, Leo Yu Zhang

Published Jun 10, 2026Featured #1In the daily list Jun 11, 2026

Open on arXiv Read PDF

Daily score72.2

Editorial review7.5

Relevance0.453

Freshness0.722

Why It Matters

What makes this one worth your time

Understanding and mitigating the risks associated with embodied AI systems is crucial for ensuring their safe deployment in real-world applications.

BadRobot exposes critical vulnerabilities in embodied LLMs that could lead to harmful actions.

Summary

The paper introduces BadRobot, a new attack paradigm that exploits vulnerabilities in embodied LLMs to induce harmful behaviors through voice-based interactions, and evaluates its effectiveness against existing frameworks.

Key contributions

Introduction of the BadRobot attack paradigm targeting embodied LLMs.
Identification of three key vulnerabilities that can lead to harmful behaviors.
Development of a benchmark for evaluating attack performance against existing embodied LLM frameworks.

Notable insights

The paper identifies specific vulnerabilities in the interaction between LLMs and physical actions, which is a relatively underexplored area in AI safety.
The construction of a benchmark for malicious queries provides a structured approach to evaluate the security of embodied LLM systems.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2407.20242v5 Announce Type: replace-cross Abstract: Embodied AI represents systems where AI is integrated into physical entities. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI by facilitating sophisticated task planning. However, a critical safety issue remains overlooked: could these embodied LLMs perpetrate harmful behaviors? In response, we introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions. Specifically, three vulnerabilities are exploited to achieve this type of attack: (i) manipulation of LLMs within robotic systems, (ii) misalignment between linguistic outputs and physical actions, and (iii) unintentional hazardous behaviors caused by world knowledge's flaws. Furthermore, we construct a benchmark of various malicious physical action queries to evaluate BadRobot's attack performance. Based on this benchmark, extensive experiments against existing prominent embodied LLM frameworks (e.g., Voxposer, Code as Policies, and ProgPrompt) demonstrate the effectiveness of our BadRobot. Our code is available at https://github.com/Rookie143/BadRobot.