Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments

Emilia Milano, Alistair Plum, Yves Scherrer, Christoph Purschke

Published May 1, 2026

Editorial review7.2

Relevance0.470

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding language ideologies in multilingual contexts can enhance social cohesion and identity recognition, making this research relevant for linguists and social scientists.

This study explores LLMs for detecting language ideologies in Luxembourgish news comments.

Summary

The paper investigates the use of large language models (LLMs) to detect language ideologies in user comments written in Luxembourgish, a language underrepresented in LLM training data, by manually annotating a corpus and evaluating LLM performance under different conditions.

Key contributions

Manual annotation of a Luxembourgish corpus with ideological categories.
Evaluation of LLM performance in ideological detection under varying prompt conditions.
Investigation of machine translation's impact on LLM performance for low-resource languages.

Notable insights

The study highlights the challenges of applying LLMs to low-resource languages and the potential benefits of machine translation for improving performance.
It emphasizes the complexity of ideological detection, which goes beyond mere language preference.

Possible limitations

The abstract does not provide specific performance metrics or results from the LLM evaluations.
Potential biases in human annotations are not addressed.

Abstract

arXiv:2604.27661v1 Announce Type: new Abstract: Detecting language ideologies is a valuable yet complex task for understanding how identities are constructed through discourse. In Luxembourg's multicultural and multilingual society, language ideologies reflect more than simple preferences: they carry deep cultural and social meanings, shaping identities and social belonging. Following recent developments in applying Natural Language Processing tools to linguistics and social science, this paper explores the potential of large language models to assist in the detection of language ideologies. We manually annotate a corpus of user comments in Luxembourgish with predefined ideological categories and then evaluate the performance of large language models under varying prompt conditions to assess their ability to replicate these human annotations. Since Luxembourgish is a small language and poorly represented in the LLMs' training data, we also investigate whether machine-translating the data to high-resource languages increases performance on the ideology detection task. Our findings suggest that, while LLMs are not yet fully optimized for a multi-class ideological annotation task, they are practical tools to identify language ideological content.