Parthenon Law: A Self-Evolving Legal-Agent Framework

Hejia Geng, Leo Liu

Published Jun 12, 2026Featured #7In the daily list Jun 13, 2026

Open on arXiv Read PDF

Daily score67.1

Editorial review7.2

Relevance0.525

Freshness0.722

Why It Matters

What makes this one worth your time

This framework could significantly improve the efficiency and reliability of legal-domain LLM agents, making them more practical for real-world legal tasks.

Parthenon is a self-evolving framework that enhances legal-agent performance by learning from outcomes without changing model weights.

Summary

The paper introduces 'Parthenon', a self-evolving legal-agent framework designed to improve the performance of legal-domain LLM agents by addressing three main challenges: lack of large-scale evidence on model performance, absence of a legal-specific agent architecture, and the need for systems to learn from outcomes. It presents a large-scale empirical study and proposes a framework that factors various components into auditable surfaces, enhancing traceability and compliance. An anti-leakage learning loop is also introduced to improve system performance without altering model weights.

Key contributions

Introduction of the Parthenon framework for legal-agent systems.
A large-scale empirical study on agent trajectories in legal tasks.
An anti-leakage learning loop for task-agnostic system improvements.

Notable insights

The framework introduces an anti-leakage learning loop that allows for system improvement without modifying model weights.
Parthenon factors multiple components into auditable surfaces, enhancing traceability and compliance in legal tasks.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2606.04602v3 Announce Type: replace Abstract: As agents grow more capable, legal-domain LLM agents promise to turn document-heavy matters into reviewable work products -- yet reliable deployment faces three obstacles: no large-scale evidence on how today's strongest model-and-harness combinations behave on end-to-end legal matters; no agent architecture adapted to the legal vertical, only general-purpose harnesses; and, in a setting that keeps shifting with new facts, authorities, and deadlines, no mechanism for systems to learn from their own outcomes. We address each. A large-scale empirical study on Harvey LAB -- $12{,}510$ agent trajectories -- shows that even frontier agents remain far from completing matters in a single pass: per-criterion accuracy climbs with stronger models while strict matter completion stalls. We then introduce \textsc{Parthenon}, a self-evolving legal-agent framework that factors Model, Harness, Agent roles, legal Knowledge, deterministic Tools, and procedural Skills into auditable surfaces for source traceability, date and number grounding, deliverable compliance, and issue closure. Finally, an anti-leakage learning loop converts scored failures into task-agnostic edits to skills, tools, and knowledge, letting the system improve with experience -- as a firm refines its checklists and playbooks after each matter -- without touching model weights. Across our large-scale empirical analysis, \textsc{Parthenon} substantially improves the performance of state-of-the-art models and harnesses on legal-matter tasks.