Retrieval-augmented generation (RAG) has been used to enhance the capabilities of large language models (LLMs) by incorporating external knowledge. However, RAG is susceptible to retrieval corruption, a type of attack in which disruptive information is inserted into the document collection, leading to the generation of incorrect or misleading responses. This poses a serious threat to the reliability of RAG systems.
Researchers from Princeton University and UC Berkeley have developed a defense framework called RobustRAG, the first of its kind specifically designed to counter retrieval corruption. The foundation of RobustRAG is an isolate-then-aggregate approach where each retrieved document is analyzed individually, and the results are securely combined to deliver a final response.
To secure aggregate unstructured text answers and implement RobustRAG, keyword-based and decoding-based algorithms have been developed. These limit and reduce the impact of contaminated passages during the aggregation process, even if some are retrieved. One of the main strengths of RobustRAG is its ability to demonstrate provable robustness for specific query types, meaning that even under an attack it will generate accurate results.
Through extensive testing on various datasets, including open-domain question answering (QA) and long-form text production, the effectiveness and versatility of RobustRAG have been validated. It not only offers solid protection against retrieval corruption but also demonstrates good generalizability across various workloads and datasets. This makes RobustRAG a robust solution for enhancing the security and reliability of RAG systems.
The team lists their main contributions as the following:
1. The development of RobustRAG, the first defense structure designed specifically to combat retrieval corruption attacks in RAG systems.
2. The creation of two secure text aggregation techniques for RobustRAG: decoding-based and keyword-based algorithms. These methods have been formally verified to be accurate and reliable in the face of certain threat scenarios involving retrieval corruption.
3. Performance verification of RobustRAG through extensive testing on three different LLMs (Misttral, Llama, and GPT) and three different datasets (RealtimeQA, NQ, and Bio), proving its efficiency and applicability across diverse settings and tasks.