Pre-trained Large language models (LLMs), such as transformers, typically have a fixed context window size, most commonly around 4K tokens. Nevertheless, numerous applications require processing significantly longer contexts, going all the way up to 256K tokens. The challenge that arises in elongating the context length of these models lies primarily in the efficient use of information derived from the middle part of the context, often identified as the “Lost-in-the-Middle” problem. Most existing procedures that focus on extending context length inevitably require extensive fine-tuning during the target length, while they often struggle to manage information derived from the middle of the context effectively.
Addressing these challenges, researchers from both the Beijing Institute for General Artificial Intelligence (BIGAI) and the National Key Laboratory of General Artificial Intelligence in Beijing, China, have introduced ContinuityRelativity indExing with gAussian Middle (CREAM). Existing methods of extending the context window of LLMs, which are based heavily on interpolated positional encodings requiring fine-tuning on target context length resulting in a high computational overhead, include positional encoding (PE)-based procedures. Methods such as efficient transformers and memory augmentation alter the model architecture or further complicate adaptation and implementation by adding auxiliary modules.
In contrast to these methodologies, CREAM was engineered to lengthen the context of LLMs efficiently. It not only manipulates position indices to interpolate positional encodings within the pre-trained context window size but also incorporates a truncated Gaussian sampling method to concentrate on the middle part of the context during the fine-tuning procedure. This unique approach enables the model to be fine-tuned within its pre-trained window size while ensuring efficient performance on extended contexts, reaching up to 256K tokens.
Two primary strategies in CREAM’s methodology include ensuring continuity and relativity in positional encoding. To uphold continuity, the method manipulates position indices, forming shorter sequences within the pre-trained context window and ensuring densely connected positional indices. For relativity, CREAM exploits rotary positional encoding (RoPE) to comprehend the relative positions between token pairs. Besides, CREAM divides the pre-trained context window into three segments, namely the head, the middle, the tail, using a truncated Gaussian function to prioritize the middle segment during fine-tuning.
Experiments with Llama-2-7B and Llama-2-7B-Chat models have proven CREAM’s efficiency and effectiveness. It can extend the context window from 4K to 256K tokens and superior performance in long-context understanding tasks. Crucially, CREAM surpassed existing methods in retrieving information from extensive contexts and mitigating the “Lost-in-the-Middle” issue, demonstrating promising results in long-context question-answering and summarization tasks, beating strong baselines with minimal fine-tuning steps.
In summary, CREAM addresses and overcomes current methods’ drawbacks by efficiently lengthening the LLM’s context while emphasizing middle-context information. The innovative method effectively balances continuity and relativity in positional encoding and adopts a truncated Gaussian sampling method to enhance the understanding of middle-content. Experimental results confirm CREAM’s effectiveness in extending context windows and improving performance in long-context scenarios, providing a practical solution to the “Lost-in-the-Middle” issue.