ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

Zexi Liu, Jingyi Chai, Xinyu Zhu, Shuo Tang, Rui Ye, Bo Zhang, Lei Bai, Siheng Chen

Published May 5, 2026

Editorial review7.0

Relevance0.467

Freshness0.000

Why It Matters

What makes this one worth your time

This work could make autonomous ML engineering more accessible by reducing computational costs and enabling smaller models to achieve competitive performance.

ML-Agent introduces a reinforcement learning framework for LLMs to autonomously perform machine learning tasks efficiently.

Summary

The paper proposes a novel framework for training large language model (LLM) agents using reinforcement learning (RL) to perform autonomous machine learning tasks. The framework includes exploration-enriched fine-tuning, step-wise RL, and a reward module tailored for ML tasks. The resulting ML-Agent, based on a 7B-sized LLM, achieves comparable performance to larger proprietary models with lower computational costs.

Key contributions

A novel agentic ML training framework for LLMs using RL.
Demonstration of competitive performance with reduced computational resources.
Development of an ML-specific reward module for RL optimization.

Notable insights

The use of exploration-enriched fine-tuning to enhance RL exploration in LLMs.
The introduction of a step-wise RL approach to improve training efficiency.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2505.23723v2 Announce Type: replace-cross Abstract: The emergence of large language model (LLM)-based agents has significantly advanced the development of autonomous machine learning (ML) engineering. However, the dominant prompt-based paradigm exhibits limitations: smaller models lack the capacity to learn from execution trajectories for generalization, while large proprietary models incur high computational overhead, restricting accessibility and scalability. Focusing on this, for the first time, we explore the paradigm of learning-based agentic ML, where an LLM agent learns through interactive experimentation on ML tasks using online reinforcement learning (RL). To realize this, we propose a novel agentic ML training framework with three key components: (1) exploration-enriched fine-tuning, which enables LLM agents to generate diverse actions for enhanced RL exploration; (2) step-wise RL, which enables training on a single action step, accelerating experience collection and improving training efficiency; (3) an agentic ML-specific reward module, which unifies varied ML feedback signals into consistent rewards for RL optimization. Leveraging this framework, we train ML-Agent, driven by a 7B-sized Qwen-2.5 LLM for autonomous ML. Despite training on only 9 ML tasks, our 7B-sized ML-Agent achieves comparable performance to agents using much larger proprietary LLMs (e.g., GPT-5) but at significantly lower computational cost, demonstrating strong performance and cross-task generalization.