Large Language Models (LLMs) have played a notable role in enhancing the understanding and generation of natural language. They have, however, faced challenges in processing long contexts due to restrictions in context window size and memory usage. This has spawned research to address these limitations and come up with ways of making the LLMs work with more considerable and complicated tasks.
Approaches used to counter the long-context challenge in LLMs include model-level methods, like transformer variants with modified attention mechanisms and positional interpolation. While some promise has been demonstrated using these methods, several drawbacks have been identified. These drawbacks include the neglect of detailed data, loss of earlier context, and increased training costs. Retrieval-based methods, like the Retrieval Augmented Generation (RAG), have also been designed but have struggled with complex questions due to limitations in decision-making mechanisms.
To offer a solution to these challenges, researchers from Alibaba Group, the University of Manchester, The Chinese University of Hong Kong, and Shanghai AI Laboratory rolled out a new robust system known as GraphReader. The new system segments long texts into smaller chunks and extracts vital information into key components and atomic facts. This information is used to develop a graph structure that can effectively capture long-range dependencies within the text on a 4k context window.
The GraphReader system works in three phases – graph construction, graph exploration, and answer reasoning. During the construction phase, a document is split into chunks and summarized in atomic facts. Nodes are created from these components and then linked. During exploration, the agent initializes the rational plan and identifies initial nodes. The agent then investigates the graph by reading the chunks and examining the atomic facts, and notes down the supporting facts.
In the answer reasoning phase, the system reviews notes from various agents using the Chain-of-Thought reasoning and builds the final answer to the question. When evaluated alongside other methods using multiple long-context benchmarks, GraphReader proved superior across various tasks. On multi-hop QA tasks, it ranked highly compared to RAG methods, long-context LLMs, and other agent-based solutions.
The system’s performance is attributed to its graph-based exploration strategy, which capably captures links between vital information and offers effective multi-hop reasoning in long contexts. GraphReader’s superior performance in organizing lengthy text, capturing long-range dependencies within a 4k context window, and ability to outdo GPT-4 with a 128k input length, ice the cake. It sets new performance standards in processing long-context in Large Language Models, and offers valuable insight in document analysis and research assistance.