Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents
Yingqi Zhang
Why It Matters
What makes this one worth your time
As LLMs evolve into more complex agents, a structured runtime like Agent libOS is crucial for ensuring safety, accountability, and effective resource management.
Agent libOS enables robust, capability-controlled execution for long-running LLM agents.
Summary
The paper introduces Agent libOS, a runtime environment designed for long-running LLM agents that allows for state maintenance, task forking, and human interaction, while implementing strict capability controls and auditing mechanisms.
Key contributions
- Introduction of a library-OS-inspired runtime for LLM agents.
- Implementation of a prototype featuring async scheduling and capability checks.
- Development of a safety-oriented evaluation framework for long-running agent processes.
Notable insights
- The design emphasizes explicit capability boundaries to enhance security and control over agent actions.
- The integration of human approval and auditing mechanisms addresses critical concerns in deploying LLM agents in real-world applications.
Possible limitations
- Not stated in the abstract.
Abstract
arXiv:2606.03895v1 Announce Type: cross Abstract: Large language model (LLM) agents are evolving from request-response assistants into long-running software actors: they maintain state across model calls, fork subtasks, wait for external events, request human authority, generate tools, and perform side effects that must be resumed and audited. This paper presents Agent libOS, a library-OS-inspired runtime substrate for LLM agents. Agent libOS runs above a conventional host operating system; it does not implement hardware drivers, kernel-mode isolation, or a POSIX-compatible operating system. Instead, it treats an agent as an AgentProcess: a schedulable execution subject with process identity, parent-child lineage, lifecycle state, a tool table derived from an AgentImage, typed Object Memory, explicit capabilities, human queues, checkpoints, events, and audit records. Its central design rule is tools are libc-like wrappers; runtime primitives are the authority boundary. Filesystem access, object access, sleeps, human approval, JIT tool registration, and external side effects are checked at primitive boundaries under explicit capabilities and policy. We describe the design, threat model, Python prototype, and safety-oriented evaluation. The current prototype implements async scheduling, namespace-local Object Memory, runtime-integrated human approval, one-shot permission grants, per-process working directories, shell and image-registration primitives, Deno/TypeScript JIT tools over a libOS syscall broker, filesystem/object bridge tools, an injectable Resource Provider Substrate, deterministic demos, real-model smoke scripts, and 123 regression tests at the time of writing. Rather than improving planner accuracy, Agent libOS demonstrates a runtime substrate in which long-running LLM agents can be scheduled, authorized, resumed, and audited without treating tool dispatch as the trust boundary.