SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision

Yuxuan Liu, Zhaochen Su, Lingyun Xie, Yuhao Zhang, Qing Zong, Jiahe Guo, Zhongwei Xie, Yiyan Ji, Yauwai Yim, Hongyu Luo, Xiyu Ren, Ruan Chenyu, Haoran Li, Yangqiu Song

Published Jun 2, 2026Featured #8In the daily list Jun 3, 2026

Open on arXiv Read PDF

Daily score69.0

Editorial review7.5

Relevance0.456

Freshness0.722

Why It Matters

What makes this one worth your time

This work addresses the critical challenge of cold-start skill development for LLM agents, providing a more efficient alternative to expert authoring and one-shot generation.

SkillRevise enhances LLM agent skills through execution-grounded iterative refinement.

Summary

The paper introduces SkillRevise, a framework that iteratively refines initial skills for LLM agents by diagnosing defects from execution evidence and applying relevant repair principles, leading to improved agent performance.

Key contributions

Introduction of the SkillRevise framework for iterative skill refinement.
Demonstration of substantial performance improvements on SkillsBench benchmarks.
Evidence of cross-model transferability of revised skills.

Notable insights

SkillRevise's use of execution evidence for diagnosing skill defects is a novel approach that could lead to more robust agent behaviors.
The framework's ability to retrieve repair principles from a general memory suggests a potential for scalable learning across different tasks.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2606.01139v1 Announce Type: new Abstract: Agent skills are procedural artifacts that enable LLM agents to execute workflows, verify constraints, and recover from failures. Existing self-evolving methods refine skills using accumulated trajectories. However, they struggle in cold-start settings, where only an initial, imperfect skill is available. Consequently, skill construction defaults to expert authoring or one-shot LLM generation. Expert-authored skills are costly and may not align with how LLM agents actually execute tasks, while one-shot generated skills can be syntactically well formed yet behaviorally weak. To bridge this gap, we propose SkillRevise, an execution-grounded framework designed to iteratively refine these initial skills. SkillRevise diagnoses skill defects from execution evidence, retrieves relevant repair principles from a general memory, and applies execution-anchored edits. By re-executing candidates and measuring empirical utility, it systematically retains the optimal skill version. Evaluated across three benchmarks and five LLMs, SkillRevise substantially outperforms one-shot baselines, improving the base agent's success rate on SkillsBench from 36.05% to 61.63%. Furthermore, the revised skills exhibit strong cross-model transferability, capturing generalized procedural knowledge over model-specific artifacts.