The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs
Yang Zhang, Xiao Fei, Amr Mohamed, Sarah Almeida Carneiro, Mersin Konomi, Mingmeng Geng, Ahmed Asaad, Guokan Shang, Michalis Vazirgiannis
Why It Matters
What makes this one worth your time
Understanding how language models access cultural knowledge can improve their deployment in multilingual settings, ensuring more accurate and culturally relevant responses.
Local languages can better access cultural knowledge in LLMs, despite lower raw accuracy.
Summary
The paper investigates whether large language models access local cultural knowledge more effectively in English or local languages, using a controlled framework that separates language proficiency from cultural knowledge access. It finds that local languages often provide better access to cultural knowledge, despite appearing weaker in raw accuracy due to proficiency differences.
Key contributions
- A controlled framework to evaluate cultural knowledge access in LLMs across languages.
- Empirical evidence showing local languages' advantage in accessing cultural knowledge after accounting for proficiency.
Notable insights
- Separating language proficiency from cultural knowledge access using a 1PL item response theory model.
- Local languages show a hidden advantage in accessing cultural knowledge when proficiency is accounted for.
Possible limitations
- Not stated in the abstract
Abstract
arXiv:2606.07422v2 Announce Type: replace-cross Abstract: Large language models are increasingly used to answer culturally grounded questions across languages, yet it remains unclear whether local cultural knowledge is better accessed through English or the local language. Existing evaluations face two key limitations: many rely on parallel template-based questions that may not reflect how cultural knowledge naturally appears, and raw accuracy conflates general language proficiency with language-conditioned knowledge access. We address these issues with a controlled framework built on real-world cultural questions collected from regional benchmarks and local sources. By crossing question type (culture-agnostic vs. culture-specific) with query language (English vs. local language), and estimating ability with a shared 1PL item response theory model, we separate proficiency from localized knowledge access. Across 13 locales and roughly 80 models, we find a consistent English advantage on culture-agnostic questions, indicating stronger English proficiency. However, after accounting for this proficiency gap, local languages show a positive knowledge-access advantage in nearly all locale-model settings. This advantage is often masked in raw accuracy but becomes more visible for frontier, regionally aligned, or language-adapted models. Our results suggest that weaker local-language performance does not necessarily imply weaker cultural knowledge; rather, local cultural knowledge may be more accessible through the local language but hidden by limited language proficiency.