Machine translation, a critical aspect of natural language processing (NLP), is centered on the development of algorithms that translate text from one language to another. This technology is crucial for overcoming language barriers and fostering global communication. Neural machine translation (NMT) has in recent times gained advancements in improving translation accuracy and fluency, pushing the limits of possible achievements in the field using deep learning techniques.
The challenge lies in the substantial differences in translation quality among high-resource and low-resource languages, with high-resource languages benefiting from a wealth of training data and subsequently outperforming low-resource languages. The latter are disadvantaged due to their need for more training data and improved translation quality. This imbalance obstructs effective communication and access to information for people who speak low-resource languages, and this is an issue this research aims to tackle.
Presently, strategies like back-translation and self-supervised learning are being used on monolingual data to boost translation quality for low-resource languages. Regularization strategies, including Gating Dropout, are used to control overfitting within the existing frameworks that use dense transformer models comprising feed-forward network layers for the encoder and decoder.
To counteract these challenges, researchers from Meta’s Foundational AI Research (FAIR) team have presented a cutting-edge approach using Sparsely Gated Mixture of Experts (MoE) models. The idea is to use numerous experts within the model each overseeing different facets of the translation procedure. A gating mechanism smartly directs input tokens to the most relevant experts, thereby enhancing translation accuracy and reducing interference among unrelated language directions.
The MoE transformer models differ significantly from the traditional dense transformer models. The MoE models substitute some feed-forward network layers in the encoder and decoder with MoE layers, which are made up of several experts including a feed-forward network and a gating network. This model better generalizes across different languages by minimizing interference and optimizing available data.
The researchers saw substantial improvements in translation quality from this model, particularly for very low-resource languages, with an increase in chrF++ scores of 12.5% for translation into English. Tests show that after filtering about 30% of parallel sentences, translation quality improved by 5%, while the resultant toxicity was likewise reduced by the same measure.
To validate these results, a comprehensive evaluation process was undertaken, combining automated metrics and human quality assessments to guarantee translation accuracy and reliability. Human evaluation scores provided a robust measure of translation quality, correlating well with automated scores.
In summary, the research team from Meta has been able to address the issue of translation quality disparity between high- and low-resource languages using MoE models. The models have considerably improved the translation performance for low-resource languages, providing a scalable solution. This represents a significant advancement in machine translation, with the aim of developing a universal translation system that caters equally well to all languages.