Rewriting the Response Path: Silent Tampering and Provider-Signed Defense in BYOK LLM Agents

Mingyu Luo, Zihan Zhang, Zesen Liu, Yuchong Xie, Zhixiang Zhang, Dung Hiu Hilton Yeung, Wai Ip Lai, Ping Chen, Ming Wen, Dongdong She

Published Jul 23, 2026Featured #4In the daily list May 7, 2026

Open on arXiv Read PDF

Daily score76.6

Editorial review8.2

Relevance0.453

Freshness0.722

Why It Matters

What makes this one worth your time

Understanding and mitigating response-path attacks is crucial for ensuring the reliability and security of LLM applications, especially in sensitive contexts.

This research exposes critical vulnerabilities in LLM agent architectures, highlighting the need for enhanced integrity measures.

Summary

The paper identifies and formalizes a new security threat, the Relay Tampering Attack (RTA), which exploits vulnerabilities in Bring-Your-Own-Key (BYOK) agent architectures to manipulate LLM outputs post-generation, demonstrating high attack success rates and evaluating several defenses.

Key contributions

Formalization of the Relay Tampering Attack (RTA) and its implications for LLM integrity.
Empirical evaluation of RTA's effectiveness against multiple LLMs and comparison with existing prompt-injection attacks.
Development of a time-based detection defense mechanism to mitigate the identified vulnerabilities.

Notable insights

The RTA demonstrates that even aligned LLMs can be compromised through strategic manipulation of outputs, emphasizing the importance of end-to-end integrity.
The proposed time-based detection defense offers a novel approach to mitigating RTA while maintaining agent functionality.

Possible limitations

The abstract does not address potential scalability issues of the proposed defense mechanism.
No mention of the computational resources required for implementing the proposed defenses.

Abstract

arXiv:2605.02187v2 Announce Type: replace-cross Abstract: LLM agents convert model outputs into consequential actions, including communications, code changes, and financial transactions. Developers often trust evidence such as test results and execution logs. We identify a response path integrity gap in Bring Your Own Key configurations used by roughly 88 percent of mainstream agents. Because traffic passes through a user-authorized relay, the relay can modify plaintext LLM responses after alignment but before execution without breaking encryption. A minimal attack rewrites one execution bearing field and regenerates the remaining response using the user key while preserving the model style. Experiments reveal false green verification, where malicious code modifications pass public tests while silently defeating security checks. On APPS, 99.7 percent of publicly passing solutions retained downgraded behavior without developer-visible warnings. Tests on SWE bench, AgentDojo, and ASB across five frontier models show that single-field rewriting can redirect agents while preserving apparent task completion. We propose sign-c, a server-side scheme that authenticates execution bearing fields and outgoing queries. A local shim verifies them before action, while encryption protects confidentiality. The defense rejected all tampered responses with zero false rejections and only 0.0167 percent latency overhead.