Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof
Alfredo Metere
Why It Matters
What makes this one worth your time
As AI systems become more complex, ensuring the reliability and safety of agent skills is crucial for their deployment in real-world applications.
This work advances formal verification of agent skills through a layered approach.
Summary
The paper presents three methods for formal verification of agent skills, enhancing the verification process from declared or tested to formal by addressing capability-containment properties in a structured manner.
Key contributions
- Development of a sound static capability-containment analysis method.
- Introduction of a refinement type system for tool-call envelopes that enforces capability checks.
- Implementation of SMT-bounded model checking to validate skill behavior against a correctness criterion.
Notable insights
- The integration of abstract interpretation with a refinement type system provides a novel approach to statically enforce capability containment.
- Using existing tools like Z3 and Semgrep allows for practical implementation without requiring new infrastructure.
Possible limitations
- Not stated in the abstract.
Abstract
arXiv:2605.23951v1 Announce Type: new Abstract: The companion paper introduced a four-level verification lattice on agent-skill manifests (unverified, declared, tested, formal) and left the top level aspirational. This paper closes that gap. We give a precise semantics for skill behaviour faithful to how a skill is consumed by an LLM-driven runtime (a deterministic script-side reachable through a non-deterministic LLM-side), state the verification problem as a capability-containment property over that semantics, and present three composable methods that together raise a skill from declared or tested to formal: (1) sound static capability-containment analysis of the script-side via abstract interpretation over a small effect lattice; (2) a refinement type system for tool-call envelopes that mechanically rejects any call whose statically-inferred capability is not in the manifest's declared set; (3) SMT-bounded model checking against the parent paper's biconditional correctness criterion, with the bound chosen so any counter-example fitting the runtime's transaction-buffer horizon is exhibited as a concrete trace. We prove the three layers composed soundly cover the parent paper's threat model modulo a single residual (the LLM's freedom to refuse to act) that the parent paper's runtime biconditional catches at session boundary. The methods reuse existing well-engineered tools (Z3, Semgrep, CodeQL, refinement-type checkers, mechanised proof assistants) rather than asking operators to build new ones, and the proof-carrying artifact extends the existing SKILL.md convention. All three methods plus the bundle producer and re-checker ship as zero-dependency JavaScript modules in the open-source enclawed framework (https://github.com/metereconsulting/enclawed; project page https://www.enclawed.com/), with 53 unit tests and an end-to-end CLI demo on a sample skill.