Large language models (LLMs) have made significant progress in the understanding and generation of natural language, but their application over long contexts is still limited due to constraints in context window sizes and memory usage. It’s a pressing concern as the demand for LLMs’ ability to handle complex and lengthy tasks is on the rise.
Various solutions have been studied to overcome these issues, each with their own set of challenges. Model-level methods can result in increased training costs, and attention neglect, while retrieval-based techniques struggle with complex queries due to limitations in the decision-making process. At the same time, agent-based methods, which leverage the planning and reflection capacities of LLMs, still face challenges in handling multi-hop queries and fully harnessing LLM capabilities.
A team of researchers from Alibaba Group, The Chinese University of Hong Kong, Shanghai AI Laboratory, and the University of Manchester have proposed a solution called GraphReader. It is an innovative graph-based agent system capable of handling long-context processing in LLMs. This approach segments texts into chunks, extracts, and compacts key details into atomic facts. These components are then used to design a graph structure that efficiently captures long-range dependencies and multi-hop relationships within the text.
GraphReader operates in three main phases: graph construction, graph exploration, and answer reasoning. During the construction phase, the text is split into chunks, summaries are made into atomic facts and key elements are extracted. Nodes are created from these components and linked based on shared key elements. During the exploration phase, the agent begins by determining a rational plan and choosing the initial nodes. It then explores the graph by investigating atomic facts, reading relevant chunks, and examining neighbouring nodes. The agent maintains a notebook to record supporting facts throughout the exploration. Finally, in the answer reasoning phase, the system compiles notes from multiple agents, analyzes them using a Chain-of-Thought reasoning, and generates a final answer.
Evaluation of GraphReader and other methods on multiple long-context benchmarks reveals that GraphReader consistently outperforms other methods across various tasks and context lengths. For example, on the HotPotQA dataset, GraphReader scores 55.0% EM and 70.0% F1 scores, surpassing GPT-4-128k and ReadAgent.
These results make GraphReader a significant advancement in addressing long-context challenges in LLMs. By compiling extensive texts into graph structures and using an autonomous agent for exploration, it effectively captures long-range dependencies within a compact 4k context window. This breakthrough provides new opportunities for employing LLMs for tasks involving lengthy documents and intricate multi-step reasoning. It could potentially revolutionize fields like document analysis and research assistance by setting a new benchmark for long-context processing.