Language modelling, an essential tool in developing effective natural language processing (NLP) and artificial intelligence (AI) applications, has significantly benefited from advancements in algorithms that understand, generate, and manipulate human language. These advancements have catalyzed large models that can undertake tasks such as translation, summarization, and question answering. However, they face notable challenges, including difficulties in recalling information over extended contexts, especially in recurrent language models. Such models often need assistance in storing and retrieving necessary information for accurate in-text learning. Hence, they need to improve to match models with unrestricted memory.
Transformers-based large language models excel in managing long-range dependencies in text via attention mechanisms. However, they require considerable memory and computation resources, which presents considerable challenges. Recurrent neural networks (RNNs) and their variants are more memory-efficient alternatives, but they frequently compromise recall quality over long sequences. This recall issue is significant in developing effective language models.
To resolve RNNs limitations, researchers from Stanford University and the University at Buffalo have invented two novel methods: JRT-Prompt and JRT-RNN. JRT-Prompt enhances recall by repeating the context in prompts, while JRT-RNN uses a non-causal recurrent architecture to better context processing. These methods aim to decrease the dependency on the data presentation order, enhancing the models’ ability to recall and use information effectively.
The JRT-Prompt improves recurrent models by repeating the input context several times and exposing the model to all data orders during training. This method effectively reduces the dependency on the sequence in which data is presented, allowing for better retention and recall of information, thus enhancing overall performance. On the other hand, the JRT-RNN uses prefix-linear attention, whereby the model processes the prompt non-causally before generating responses. This significantly enhances the model’s ability to recall and use information, providing a more efficient and effective solution to the recall problem in recurrent language models.
Upon evaluation, JRT-Prompt demonstrated an 11.0 plus or minus 1.3 point improvement across multiple tasks and models, while JRT-RNN showed up to a 13.7-point improvement in quality, emphasizing that the proposed methods can match or exceed the performance of traditional Transformer models while using less memory.
Empirical studies confirmed the effectiveness of JRT-Prompt and JRT-RNN. JRT-Prompt was assessed across 16 off-the-shelf recurrent LMs and six in-context learning tasks, consistently displaying substantial improvements in recall quality. JRT-RNN harmonized the strengths of recurrent and linear attention models, reaching 99% of Transformer quality.
In summary, the research addresses vital information recall issues in recurrent language models and introduces effective solutions. By enhancing data order handling and context processing, JRT-Prompt and JRT-RNN deliver promising solutions that improve the quality and efficiency of language models. These innovations are a significant step towards developing more capable language modelling techniques, and they enhance recall quality while improving computational efficiency, making them valuable tools.