Parthenon Law: A Self-Evolving Legal-Agent Framework
Hejia Geng, Leo Liu
Why It Matters
What makes this one worth your time
This framework could significantly improve the efficiency and reliability of legal-domain LLM agents, making them more practical for real-world legal tasks.
Parthenon is a self-evolving framework that enhances legal-agent performance by learning from outcomes without changing model weights.
Summary
The paper introduces 'Parthenon', a self-evolving legal-agent framework designed to improve the performance of legal-domain LLM agents by addressing three main challenges: lack of large-scale evidence on model performance, absence of a legal-specific agent architecture, and the need for systems to learn from outcomes. It presents a large-scale empirical study and proposes a framework that factors various components into auditable surfaces, enhancing traceability and compliance. An anti-leakage learning loop is also introduced to improve system performance without altering model weights.
Key contributions
- Introduction of the Parthenon framework for legal-agent systems.
- A large-scale empirical study on agent trajectories in legal tasks.
- An anti-leakage learning loop for task-agnostic system improvements.
Notable insights
- The framework introduces an anti-leakage learning loop that allows for system improvement without modifying model weights.
- Parthenon factors multiple components into auditable surfaces, enhancing traceability and compliance in legal tasks.
Possible limitations
- Not stated in the abstract
Abstract
arXiv:2606.04602v3 Announce Type: replace Abstract: As agents grow more capable, legal-domain LLM agents promise to turn document-heavy matters into reviewable work products -- yet reliable deployment faces three obstacles: no large-scale evidence on how today's strongest model-and-harness combinations behave on end-to-end legal matters; no agent architecture adapted to the legal vertical, only general-purpose harnesses; and, in a setting that keeps shifting with new facts, authorities, and deadlines, no mechanism for systems to learn from their own outcomes. We address each. A large-scale empirical study on Harvey LAB -- $12{,}510$ agent trajectories -- shows that even frontier agents remain far from completing matters in a single pass: per-criterion accuracy climbs with stronger models while strict matter completion stalls. We then introduce \textsc{Parthenon}, a self-evolving legal-agent framework that factors Model, Harness, Agent roles, legal Knowledge, deterministic Tools, and procedural Skills into auditable surfaces for source traceability, date and number grounding, deliverable compliance, and issue closure. Finally, an anti-leakage learning loop converts scored failures into task-agnostic edits to skills, tools, and knowledge, letting the system improve with experience -- as a firm refines its checklists and playbooks after each matter -- without touching model weights. Across our large-scale empirical analysis, \textsc{Parthenon} substantially improves the performance of state-of-the-art models and harnesses on legal-matter tasks.