Retrieval-Augmented Generation (RAG) methods improve the ability of large language models (LLMs) by incorporating external knowledge gleaned from vast data sets. These methods are particularly useful for open-domain question answering where detailed and accurate answers are needed. RAG systems can utilize external information to complement the inherent knowledge built into LLMs, making them more effective at handling complex queries. However, there exists a challenge in balancing the “retriever” and “reader” components of RAG systems. Classic frameworks often use short retrieval units (such as 100-word passages), which can burden the retriever with large amounts of data to sift through while the reader’s task remains relatively straightforward.
To address these shortcomings and balance this skew, scientists from the University of Waterloo have introduced a new framework called LongRAG. This system features a “long retriever” and a “long reader,” both designed to handle longer retrieval units (around 4K tokens each). LongRAG reduces the number of retrieval units from roughly 22 million to 600,000, aiding the retriever significantly and increasing the overall effectiveness and efficiency of the system.
The LongRAG system works by grouping documents into long retrieval units, which the long retriever then processes to find relevant data. The retriever filters the top 4 to 8 units, concatenates them, and feeds them into a long-context LLM, like Gemini-1.5-Pro or GPT-4o. With this method, the system employs the power of long-context models to digest large data sets efficiently, ensuring a more thorough and accurate extraction of information.
In the testing phase, LongRAG performed exceptionally well. It achieved an exact match (EM) score of 62.7% on the Natural Questions (NQ) dataset and reached an EM score of 64.3% on the HotpotQA dataset. These results were comparable to state-of-the-art fine-tuned RAG models, revealing the effectiveness of the LongRAG framework.
In LongRAG, the capability to manage long retrieval units maintains the semantic completeness of documents, thus enabling more accurate and thorough responses. This strategy lightens the load on the retriever and utilizes the power of advanced long-context LLMs, creating a more balanced and efficient RAG method. The research catalyzed a re-evaluation of RAG system design and showcased the potential for continued progress in this field.
In summary, LongRAG has made significant strides in mitigating the inefficiencies of traditional RAG systems. By using longer retrieval units and capitalizing on the advanced capabilities of LLMs, the precision and efficiency of open-domain question-answering tasks have been significantly improved. This ground-breaking framework provides a foundation for future advancements in retrieval-augmented generation systems.