Phun-Bench: Evaluating LLMs on Phonological Understanding in Chinese

Xing Yue, Yongliang Shen, Weiming Lu

Published Jun 8, 2026

Editorial review7.2

Relevance0.454

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding LLMs' phonological capabilities is crucial for improving their linguistic performance, especially in languages like Chinese where phonology plays a key role.

Phun-Bench assesses LLMs' phonological understanding in Chinese, revealing significant gaps in their performance.

Summary

The paper introduces Phun-Bench, a new benchmark designed to evaluate large language models' phonological understanding in Chinese, focusing on tasks related to homophony, rhyme, and phonetic similarity, and reveals that LLMs struggle with phonological knowledge compared to human speakers.

Key contributions

Development of the Phun-Bench benchmark for phonological evaluation of LLMs.
Identification of specific phonological tasks (Homophony, Rhyme, Phonetic Similarity) that challenge LLMs.
Hypothesis on the underlying mechanisms of LLMs' phonological understanding.

Notable insights

The benchmark targets specific phonological dimensions that have been largely neglected in LLM evaluations.
The findings suggest a fundamental difference in phonological processing between LLMs and human speakers.

Possible limitations

Not stated in the abstract.

Abstract

arXiv:2606.07300v1 Announce Type: new Abstract: Language is a vehicle for thought, intricately tied to sounds, symbols, and meaning. However, most large language model (LLM) research focuses on meaning (semantics) and symbols (spelling) while largely overlooking sounds. Existing benchmarks on LLMs' phonological abilities are either solvable through rote memorization or intertwined with other abilities, making them inadequate to measure LLMs' genuine ability in phonological understanding. Here, we present Phun-Bench, a purpose-built Chinese benchmark with diverse tasks and settings across three dimensions (Homophony, Rhyme, and Phonetic Similarity), designed to systematically evaluate LLMs' phonological understanding. Our results show that while LLMs excel at recalling correct pronunciations, they generally struggle to leverage phonological knowledge in the flexible and intuitive way that human speakers do. Moreover, through detailed analyses, we propose a hypothesis regarding the underlying mechanism of LLMs' phonological understanding and "perception", highlighting an underexplored frontier for future research.