Language Models (LMs) can have difficulty in reasoning tasks like mathematics or coding, particularly in low-resource languages. These struggles come as LMs tend to be largely trained on data from high-resource languages, leaving smaller languages underrepresented. This issue can be intensified in specialised LMs, such as Orca 2 and MetaMath which have undergone considerable adaptation, mainly in English.
In the past, this problem was tackled by consistently training English-focused LMs on the target languages. However, this approach isn’t easily scalable across multiple languages due to the need for unique language-specific training data.
Researchers from the University of Washington and KAIST have developed a new method called ‘LANGBRIDGE’. This method allows LMs to adapt to multi-language reasoning tasks without explicit multilingual training data. LANGBRIDGE combines two specialised models – one with a comprehension of several languages (like an mT5 encoder) and another focused on reasoning (like Orca 2), and connects them through minimal trainable parameters.
Crucially, this approach doesn’t require multilingual supervision and is reliant solely on English data, while still generalising to multiple languages during tests, much like zero-shot cross-lingual transfer. Tests of LANGBRIDGE showed a marked improvement in multilingual reasoning performance across mathematical reasoning, coding, and logical reasoning.
Despite only being trained on English data, LANGBRIDGE considerably enhances LMs’ performance in low-resource languages across reasoning tasks like coding, logic, and maths. This success is attributed to the language-neutral nature of multilingual representations influenced by multimodal literature. For example, applying LANGBRIDGE to MetaMath-13B using the mT5-XXL encoder improves average accuracy from 40.5% to 55.8%.
The researchers believe that the effectiveness of LANGBRIDGE is due to the language-neutral nature of multilingual representations. Mapping these representations onto the input space of the LMs allows the models to understand their semantics, rendering the input language irrelevant. This hypothesis is supported by empirical analysis using techniques like principal component analysis and qualitative methods.
Although multilingual representations are generally language-neutral, opportunities for improvement still exist. The degree to which LANGBRIDGE can amplify a given language’s reasoning capability depends mainly on the initial competency of the language model and the encoder model for that language. Despite this, LANGBRIDGE holds the potential to generalise to all languages supported by the multilingual encoder.