LLM Agent-Assisted Reverse Engineering with Quantitative Readability Metrics

Neil Archibald, Ruben Thijssen

Published Jun 8, 2026Featured #10In the daily list Jun 9, 2026

Open on arXiv Read PDF

Daily score62.4

Editorial review7.2

Relevance0.474

Freshness0.722

Why It Matters

What makes this one worth your time

Improving the readability of decompiled code can significantly aid reverse engineering efforts, making it more accessible for developers and researchers in security and software analysis.

A novel approach to enhance the readability of decompiled code using LLM agents and quantitative metrics.

Summary

This paper presents a framework for improving the readability of decompiled C code using LLM agents guided by a new Quantitative Readability Score (QRS), which combines multiple readability metrics to enhance code quality without sacrificing correctness.

Key contributions

Development of the Quantitative Readability Score (QRS) framework.
Demonstration of QRS-guided refinement for LLM agents to improve code readability.
Identification of challenges in previous phases of research related to readability improvements.

Notable insights

The introduction of a composite readability metric (QRS) that combines multiple sub-metrics is a clever approach to address the limitations of previous methods.
The observation that agents may optimize for metrics in unintended ways highlights the complexity of aligning AI objectives with human-readable outcomes.

Possible limitations

Potential issues with the generalizability of the QRS framework across different types of code or programming languages.
Not stated in the abstract.

Abstract

arXiv:2606.06838v1 Announce Type: cross Abstract: Automatic decompilers produce functionally correct but often unreadable C code. This paper addresses one stage of the reverse engineering workflow: improving the readability of decompiled code using LLM agents guided by quantitative metrics. We present a three-phase research evolution. Phase 1 (tool-driven steering via Ghidra MCP) suffered from incomplete coverage and inconsistent improvements due to lack of quantitative guidance. Phase 2 (structural similarity validation alone) revealed that agents optimize for metrics in unintended ways, producing structurally equivalent but less readable code. Our contribution is the Quantitative Readability Score (QRS) framework, a composite metric combining a structural similarity gate with three independent readability sub-metrics (Lexical Surprisal, Structural Simplicity, and Idiomatic Quality). We demonstrate that QRS-guided refinement enables LLM agents to make targeted readability improvements without sacrificing correctness. We provide a discussion of the broader reverse engineering workflow (binary lifting, decompilation cleanup, and achieving functional equivalence) as context, however, it remains out of scope.