Understanding and processing Hebrew language has always been a challenge due to its morphologically rich structure and the use of prefixes, suffixes, and infixes that change the meaning and tense of words. This has posed particular challenges for AI language models, which often struggle to interpret the subtleties of lesser-known, low-resource languages accurately. Addressing this issue, Hugging Face, a leading data science entity, has launched a new Language Learning Model (LLM) project dedicated to Hebrew.
The study’s primary output is the development of an open LLM scoreboard that operates as an effective tool to assess and improve Hebrew language models. The scoreboard serves as a performance tracker, enabling the identification of shortcomings in existing models and encouraging improvements. The team behind the project aims for the initiative to drive a better understanding of the complexities of the Hebrew language and establish community-driven improvements in the models being used.
Built on the Demo Leaderboard template, the Hugging Face team took inspiration from the Open LLM Leaderboard for this project. It deploys models using the Inference Endpoints of HuggingFace and assesses them via literal library-managed API queries. The leaderboards provide performance metrics on language-specific activities, in this case Hebrew, comparing the results with global industry benchmarks.
An integral part of the model is its capacity to assess LLMs’ comprehension and production of Hebrew independently of their performance in other languages. This is facilitated by using four datasets that evaluate models using a few-shot prompt format, i.e., without much context.
Firstly, the Hebrew Question dataset tests the model’s comprehension and ability to retrieve accurate responses. Secondly, the Sentiment Accuracy dataset checks how well the model identifies and interprets sentiments in Hebrew text. The third dataset called Winograd Schema, assesses the model’s understanding of the Hebrew contextual ambiguity and pronoun resolution. Lastly, the Translation set looks at the model’s aptitude for multilingual translation tasks between Hebrew and English.
This open Hebrew LLM Shopboard goes beyond just providing a score, offering the opportunity to uncover gaps in Hebrew language technology research. By providing comprehensive and well-targeted evaluations, the Hugging Face team aims to inspire the creation of linguistically diverse models, thereby emphasizing the importance of the Hebrew language in AI technology.
The initiative is expected to usher in significant leaps for the Hebrew language in AI and set the path towards acknowledging language diversity in future advances in technology.