LACUNA: Safe Agents as Recursive Program Holes
Yaoyu Zhao, Yichen Xu, Oliver Bra\v{c}evac, Cao Nguyen Pham, Frank Zhengqing Wu, Martin Odersky
Why It Matters
What makes this one worth your time
This work is relevant for AI engineers and researchers interested in developing more expressive and safe agent systems that can dynamically interact with their runtime environments.
LACUNA enables safe, expressive agent code by integrating runtime shaping with type-checking.
Summary
The paper introduces LACUNA, a programming model that allows LLM agents to write code that shapes their runtime environment while maintaining safety through type-checking and compiler diagnostics. This model aims to close the gap between the runtime and model-written code, allowing for more expressive agents without compromising safety.
Key contributions
- Introduction of LACUNA, a programming model for safe and expressive agent code.
- Demonstration of LACUNA's ability to express various control flows like ReAct loops and multi-model planning.
- Evaluation of LACUNA on BrowseComp-Plus and τ²-bench, showing its effectiveness in maintaining safety and performance.
Notable insights
- The use of type-checking and compiler diagnostics to ensure safety in agent-generated code is a clever approach to mitigate runtime errors.
- Integrating runtime shaping with model-written code could potentially enhance the expressiveness and flexibility of AI agents.
Possible limitations
- Not stated in the abstract
Abstract
arXiv:2605.28617v1 Announce Type: new Abstract: LLM agents increasingly act by writing code, yet a split persists between the runtime that drives the agent and the code the model writes. The runtime owns the loop, context, and control flow, and the model has little say over any of them. Letting model-written code shape the runtime itself would make agents more expressive, but it would also sharpen safety problems. A model can be diverted by a prompt injection, call the wrong tool, or fail partway and leave an inconsistent state, and each such failure reaches further when the code shapes the runtime than when it expresses a single action. We present LACUNA, a programming model for agents that closes this split while preserving safety. Each agent action is a typed call $\texttt{agent[T](task)}$ that the LLM fills with code when execution reaches it, and the code is type-checked against the surrounding program before it runs. Because each action is accepted or rejected as a whole, a rejected one leaves the environment untouched, and its compiler diagnostics drive a retry. The same check also bounds which tools and data an action may use and how they flow. Our primitive expresses ReAct loops, sub-agents, skills, parallel decomposition, and multi-model planning as ordinary control flow. We evaluate LACUNA on a collection of test cases, BrowseComp-Plus, and $\tau^2$-bench. On BrowseComp-Plus, $8.6\%$ of generations are rejected before execution, with 0.7 retries per query on average, and the agent reaches $27.1\%$ accuracy. On $\tau^2$-bench, LACUNA solves $76.0\%$ of $392$ tasks across four domains with a capable model, on par with the baseline agent.