Inducing Reasoning Primitives from Agent Traces

Zhihan Lei, Jiarui Yan, Joshua Momo, William W. Cohen

Published Jun 3, 2026

Editorial review7.2

Relevance0.483

Freshness0.000

Why It Matters

What makes this one worth your time

This approach could improve the efficiency and effectiveness of LLM agents by leveraging learned reasoning patterns, potentially reducing computational costs and enhancing problem-solving capabilities.

The paper presents a method to enhance LLM agent performance by mining and reusing reasoning primitives.

Summary

The paper introduces Reasoning Primitive Induction, a method to extract and cluster reasoning moves from ReAct-style LLM agent traces, converting them into a library of pseudo-tools that improve performance on various tasks.

Key contributions

Development of Reasoning Primitive Induction to mine and cluster reasoning moves.
Creation of a library of typed pseudo-tools from agent traces.
Demonstration of improved performance on multiple reasoning tasks using the induced libraries.

Notable insights

The method involves clustering recurrent reasoning moves into pseudo-tools, which are then used to outperform the original agent.
The induced libraries show significant performance improvements across multiple tasks, indicating the potential for broad applicability.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2606.02994v1 Announce Type: new Abstract: ReAct-style LLM agents often rediscover the same reasoning routines across problems, yet leave those routines trapped in transient scratchpads. We introduce Reasoning Primitive Induction, a single-pass method that mines successful ReAct traces, clusters recurrent reasoning moves, and converts the most frequent moves into a compact library of typed pseudo-tools. Each pseudo-tool is specified by a natural-language docstring interpreted by an LLM at invocation time, and a standard ReAct loop composes these primitives at test time. The central result is that induced libraries outperform the very agent that generated their traces: by +44pp on RuleArena NBA (30 -> 74), +30pp on MuSR team allocation (38 -> 68), and +22pp on NatPlan meeting planning (7 -> 29). Across five comparable subtasks spanning narrative deduction, rule application, and constraint-satisfaction planning, a single fixed configuration improves over zero-shot Chain-of-Thought on every subtask, matches or surpasses expert-authored decompositions, and outperforms AWM at lower average inference cost.