Large language models (LLMs) are being extensively used in multiple applications. However, they have a significant limitation: they struggle to process long-context tasks due to the constraints of transformer-based architectures. Researchers have explored various approaches to boost LLMs’ capabilities in processing extended contexts, including improving softmax attention, reducing computational costs and refining positional encodings. Techniques like retrieval-based methods, especially group-based k-NN retrieval have shown notable promise but there is still a considerable gap in performance between short and long-context tasks.
Meanwhile, the field of neural models of episodic memory, which focuses on the brain’s process for storing experiences, suggests that these models could function as episodic memory retrieval models to enhance LLMs’ performance, if appropriate context information is integrated. This provided the motivation for a research team from Huawei Noah’s Ark Lab and University College London to propose a unique architecture, EM-LLM.
EM-LLM integrates episodic memory into Transformer-based LLMs enabling them to handle substantially longer contexts. The architecture creates memories by segmenting token sequences into events based on surprise levels. Memory retrieval then employs a two-stage mechanism: a k-NN search for retrieving similar events, and a contiguity buffer to preserve temporal context. The model imitates human episodic memory and improves the LLM’s ability to deal with extended contexts and efficiently perform complex temporal reasoning tasks.
In practice, the EM-LLM extends pre-trained LLMs to manage larger context lengths by categorizing the local context, evicted tokens and initial tokens. Most past tokens are managed by the memory model, acting as a short-term episodic memory, and initial tokens function as attention sinks. In comparison to InfLLM, a baseline model, EM-LLM showcased a distinct improvement in performance across long-context tasks. Indeed, in the LongBench dataset, EM-LLM surpassed InfLLM in all tasks apart from one.
By integrating human episodic memory and event cognition into transformer-based LLMs, EM-LLM can process data from substantially extended contexts without pre-training, marking a significant step forward in the field. Surprise-based event segmentation, graph-theoretic boundary refinement and two-stage memory retrieval contribute to stellar performance in long-context tasks. This flexible framework serves as an effective alternative to the traditional RAG techniques and opens up opportunities for further testing.
EM-LLM fundamentally alters how LLMs interact with continuous, personalized exchanges and outlines a path towards creating virtually infinite context windows. This combination of cognitive science and machine learning enhances not only LLM performance but also encourages further exploration of the intersection of LLMs and human memory mechanisms. The researchers’ paper on EM-LLM provides a fuller insight into their findings in this promising new approach to enhance LLMs.