Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages

Stephen E. Moore, Mich-Seth Owusu, Akwasi Asare, Lawrence Adu Gyamfi, Paul Azunre, Joel Budu, Jonathan Asiamah, Elias Dzobo, Kelvin Newman, Edmund O. Benefo, Gerhardt Datsomor, Onesimus Addo Appiah, Ama Branoa Banful, Lucas Woedem Kpatah, Saani Mustapha Deishini, John Ayernor

Published May 7, 2026

Open on arXiv Read PDF

Editorial review6.8

Relevance0.497

Freshness0.000

Why It Matters

What makes this one worth your time

Understanding LLM performance on low-resource languages like those in Ghana is crucial for expanding AI's global accessibility and utility.

Nsanku evaluates LLMs' zero-shot translation capabilities for 43 Ghanaian languages, revealing performance gaps.

Summary

The paper introduces Nsanku, a benchmark for evaluating the zero-shot translation performance of 19 large language models on 43 Ghanaian languages using sentences from the YouVersion Bible platform. It employs BLEU and chrF metrics to assess translation quality and highlights the lack of high performance and consistency in current models for these languages.

Key contributions

Development of Nsanku, a benchmark for evaluating LLM translation performance on Ghanaian languages.
Comprehensive evaluation of 19 LLMs using BLEU and chrF metrics.
Introduction of a cross-language consistency dimension to assess translation reliability.

Notable insights

The use of a religious text as a consistent source for evaluation sentences provides a standardized basis for comparison.
The cross-language consistency dimension offers a novel perspective on translation reliability across multiple languages.

Possible limitations

Not stated in the abstract

Abstract

arXiv:2605.04208v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated impressive multilingual capabilities for well-resourced languages, yet their performance on low-resource African languages remains poorly understood and largely unevaluated. This paper presents Nsanku, a systematic benchmark that evaluates the zero-shot machine translation performance of 19 open-weight and proprietary LLMs across 43 Ghanaian languages paired with English. Evaluation sentences were sourced from the YouVersion Bible platform, providing 300 sentence pairs per language. Two complementary automatic metrics are employed: Bilingual Evaluation Understudy (BLEU) and Character n-gram F-Score (chrF), alongside an average accuracy score and a cross-language consistency dimension. Nsanku represents the most comprehensive LLM translation evaluation for Ghanaian languages conducted to date. Results show that gemini-2.5-flash achieves the highest overall average score of 26.88 (BLEU: 24.60, chrF: 29.16), followed by claude-sonnet-4-5 at 24.87 (BLEU: 22.46, chrF: 27.28) and gpt-4.1 at 23.20 (BLEU: 21.15, chrF: 25.24). Among open-weight models, kimi-k2-instruct-0905 leads at an average score of 20.87. A critical finding from the consistency analysis is that no model and no language reached the Leaders quadrant of high performance and high consistency simultaneously, indicating that current LLMs are not yet reliably usable for Ghanaian language translation at scale. Siwu achieved the highest per-language average score at 25.73 while Nkonya scored lowest at 11.65. Nsanku establishes a publicly available, community-extensible evaluation infrastructure for African language NLP research.