Alyah ⭐️: Toward Robust Evaluation of Emirati Dialect Capabilities in Arabic LLMs

A new benchmark, Alyah (الياه), meaning “North Star” in Emirati, has been introduced to evaluate the capabilities of Arabic Large Language Models (LLMs) in understanding the Emirati dialect; it contains 1,173 manually curated samples from native speakers. This addresses a critical gap in existing Arabic LLM benchmarks, which primarily focus on Modern Standard Arabic and neglect the nuances of regional dialects.

This benchmark is crucial because LLMs are increasingly used in conversational settings where understanding regional dialects is paramount, yet a model proficient in formal Arabic may fail to grasp colloquial expressions or cultural references. Failing to address this gap can lead to ineffective or even culturally insensitive AI applications, hindering wider adoption and trust.

Why This Matters

Current Arabic LLMs often perform poorly on dialectal Arabic due to a lack of training data and evaluation benchmarks focused on these variations. Ideal models would seamlessly understand and generate dialectal Arabic, but in reality, models struggle with culturally embedded meaning and pragmatic usage, leading to potential misinterpretations and reduced usability, especially in real-world applications where conversational AI is deployed. The cost of failing to address this issue includes decreased user satisfaction, limited market reach, and potential for cultural misunderstandings.

Key Insights

1,173 samples: Alyah comprises a manually curated dataset of this size, ensuring linguistic authenticity and cultural grounding.
Instruction tuning improves performance: Instruction-tuned models consistently outperform base models, particularly in categories requiring conversational understanding.
Multilingual models show degradation: Even strong multilingual models struggle with nuanced dialect-specific semantic knowledge, highlighting the need for dedicated dialect training.

Working Example

(No code provided in the context)

Practical Applications

Customer Service Chatbots (UAE): Deploying a chatbot trained and evaluated on Alyah could provide more natural and effective customer support in the Emirati dialect.
Machine Translation Pitfall: Relying on models trained solely on Modern Standard Arabic for translating Emirati dialect can result in inaccurate or nonsensical translations, damaging credibility.

References:

https://huggingface.co/blog/tiiuae/emirati-benchmarks

On This Page

Alyah ⭐️: Toward Robust Evaluation of Emirati Dialect Capabilities in Arabic LLMs