NLG Evaluation: Past, Present, Future

Ehud Reiter

Published May 25, 2026

Editorial review6.5

Relevance0.512

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding the evolution of NLG evaluation helps researchers and engineers improve current methodologies and anticipate future challenges in the field.

The paper surveys the evolution of NLG evaluation and forecasts future trends.

Summary

The paper reviews the evolution of Natural Language Generation (NLG) evaluation from 1990 to the present and speculates on future trends, highlighting the shift from linguistic ties to machine learning and the development of new evaluation techniques like LLM-as-Judge.

Key contributions

Historical overview of NLG evaluation methods.
Identification of future trends in NLG evaluation.

Notable insights

The transition from linguistic-based to machine learning-based evaluation in NLG.
The introduction of LLM-as-Judge as a recent evaluation technique.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2605.23715v1 Announce Type: new Abstract: Natural Language Generation (NLG) evaluation has changed dramatically since 1990, and will continue to evolve in the future. In 1990, when NLG had close ties to linguistics, there was very little formal experimental evaluation in the modern sense. In 2026, when NLG is closely linked to machine learning, experimental evaluation is expected and indeed fundamental to research. Many evaluation techniques were developed over this period, including most recently LLM-as-Judge. I expect NLG evaluation will continue to evolve in the future. In particular, impact, qualitative, and safety evaluation will become more important as large numbers of people routinely use NLG technology.