Controllable Spoken Dialogue Generation: An LLM-Driven Grading System for K-12 Non-Native English Learners

Haidong Yuan, Haokun Zhao, Wanshi Xu, Songjun Cao, Qingyu Zhou, Long Ma, Hongjie Fan

Published Apr 27, 2026

Editorial review6.8

Relevance0.479

Freshness0.000

Why It Matters

What makes this one worth your time

This work addresses the gap in language learning tools for non-native English speakers, providing a scalable solution for personalized education in non-immersive environments.

A framework for proficiency-aligned dialogue generation tailored to K-12 non-native English learners.

Summary

The paper introduces a framework for adapting large language model outputs to the proficiency levels of K-12 non-native English learners, using China's national curriculum as a case study. It employs a four-tier grading system and a new algorithm called Diversity Driven Policy Optimization (DDPO) to enhance dialogue diversity and quality. The framework is designed to be flexible and adaptable to other educational standards, with open-sourced models, data, and code.

Key contributions

Proficiency-aligned framework for LLM outputs.
Diversity Driven Policy Optimization (DDPO) algorithm.
Open-sourced models, data, and code for educational adaptation.

Notable insights

The use of a four-tier grading system to control lexical complexity in dialogue generation.
The development of the DDPO algorithm to balance dialogue diversity and quality.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2604.22542v1 Announce Type: cross Abstract: Large language models (LLMs) often fail to meet the pedagogical needs of K-12 English learners in non-native contexts due to a proficiency mismatch. To address this widespread challenge, we introduce a proficiency-aligned framework that adapts LLM outputs to learner abilities, using China's national curriculum (CSE) as a representative case. Our framework enables precise control over lexical complexity through a four-tier grading system, supported by a comprehensive suite of new resources: graded vocabulary lists and a multi-turn dialogue corpus. Our core technical contribution is the \textbf{DDPO} algorithm,Diversity Driven Policy Optimization, a multi-turn GRPO-based approach designed to preserve dialogue diversity while holistically optimizing dialogue quality. This method significantly outperforms conventional approaches, achieving low out-of-vocabulary rates and high diversity while enhancing conversational naturalness and pedagogical value. While grounded in the CSE, our framework is designed for flexibility and can be readily adapted to other educational standards. Our models, data, and code will all be open-sourced, providing a scalable platform for personalized English speaking practice that effectively addresses the unique challenges faced by K-12 learners in non-immersive environments.