Retrieval-augmented generation (RAG) is a technique that enhances large language models’ capacity to handle specific expertise, offer recent data, and tune to specific domains without changing the model’s weight. RAG, however, has its difficulties. It struggles with handling different chunked contexts efficiently, often doing better with a lesser number of highly relevant contexts. Similarly, ensuring a high recall of relevant content within a limited number of retrieved contexts is challenging. Though ranking models can improve context choice, their abilities for zero-shot generalization are often inferior to more adaptable large language models (LLMs).
Researchers have tried to address these challenges with different technological approaches, including ranking methods to improve the quality of information retrieval within the RAG pipeline. Nevertheless, these methods also encounter issues, as it is often necessary to rely on additional models that may not be able to fully capture the relevance of the query-context and perform badly in the context of zero-shot generalization.
In response to these challenges, researchers from NVIDIA and Georgia Tech developed a pioneering framework called RankRAG. This process specifically fine-tunes an LLM to conduct both context ranking and answer creation within RAG. This comprehensive training approach aims to improve LLMs’ ability to filter out irrelevant contexts during both the retrieval and generation phases.
RankRAG introduces a specialized task that focuses on identifying relevant contexts or passages for given questions. At the inference stage, the model first re-ranks the retrieved contexts before producing answers based on the refined top-k contexts.
RankRAG’s instruction-tuning system uses a two-stage process. The first stage involves supervised fine-tuning using diverse instruction-following datasets. In the second stage, the tasks of ranking and generation are unified, incorporating numerous context-rich data types. During inference, a retrieve-rerank-generate pipeline is deployed: RankRAG retrieves top-N contexts, reranks them to select the most relevant top-k, and generates answers based on these refined contexts.
The RankRAG model has proven superior in the domain of RAG tasks in various benchmarks. It consistently outperforms the ChatQA-1.5 8B model and holds its own against larger models. The 70B version surpasses the strong ChatQA-1.5 70B model and significantly outperforms previous RAG baselines using InstructGPT.
Notably, RankRAG demonstrates more substantial improvements on challenging datasets, with an over 10% improvement compared to ChatQA-1.5. Its ability to effectively rank contexts has proven beneficial in scenarios where top retrieved documents are less relevant to the answer, thus improving performance in complex OpenQA tasks.
In conclusion, RankRAG presents a significant advancement in retrieval-augmented generation systems. Its strength lies in its ability to fine-tune a single LLM to perform both the ranking of contexts and the generation of answers simultaneously. Using a modest amount of ranking data in the training process allows RankRAG’s LLMs to surpass the performance of existing expert ranking models. Comprehensive evaluations have validated the framework’s effectiveness on knowledge-intensive benchmarks. RankRAG holds promise for future RAG systems applications due to its improved performance across diverse domains.