The alignment of language models is a critical factor in creating more effective, user-centric language technologies. Traditionally, aligning these models in line with human preferences requires extensive language-specific data which is frequently unavailable, especially for less common languages. This lack of data poses a significant challenge in the development of practical and fair multilingual models.
Teams from MIT, Google Research, and Google DeepMind have now developed an innovative approach that allows for the alignment of language models across different languages without the need for language-specific data. Termed as zero-shot cross-lingual alignment, this technique uses a reward model initially trained in one language, typically English, and applies it to other languages. This bypasses the usual requirement of vast amounts of language-specific training data.
The effectiveness of this new method was demonstrated through two primary tasks – text summarization and open-ended dialogue generation. The research team ran these experiments across a number of languages including German, English, Spanish, Russian, Turkish, and Vietnamese, using two optimization strategies – reinforcement learning and best-of-n re-ranking. The results indicated that the reward model maintained effectiveness even when applied to a different target language, often outperforming traditional models that were aligned using language-specific data.
Surprisingly, sometimes a reward model from a different source language outperformed a model from the same target language. For instance, using an English reward model to align a German language model often produced more aligned outputs than a German reward model. These aligned models showed a 20% to 30% improvement in alignment accuracy with human preferences over baseline models for dialogue generation tasks.
In text summarization tasks, models aligned using this new method were preferred in over 70% of cases evaluated by human judges, indicating strong preference for aligned models and demonstrating the technique’s practical utility.
In summary, the research tackles the problem of language model alignment without the need for extensive language-specific data by using a reward model trained in one language and applying it across others. This significantly reduces the need for multilingual human-annotated data and results have shown that there is a strong preference for cross-lingually aligned models.
The success of the zero-shot cross-lingual alignment technique demonstrates the potential for transforming language model alignment and the enhancement of multilingual communication. Now, further study would be required to build on this promising start for developing practical and fair language models across various languages.