The Dynamic Retrieval Augmented Generation (RAG) approach is designed to boost the performance of Large Language Models (LLMs) through determining when and what external information to retrieve during text generation. However, the current methods to decide when to recover data often rely on static rules and tend to limit retrieval to recent sentences or tokens, leaving out the complete context and potentially introducing unnecessary information, resulting in increased computational costs.
To address this issue, researchers from Tsinghua University and the Beijing Institute of Technology have developed a new framework called DRAGIN (Dynamic Retrieval Augmented Generation), which is specifically suited to LLMs. DRAGIN improves the retrieval process by dynamically determining when and what to retrieve based on the real-time needs of the LLM during text generation. The framework introduces two key techniques: Real-time Information Needs Detection (RIND) for timing retrieval, and Query Formulation based on Self-attention (QFS) for query formulation.
The RIND methodology evaluates the uncertainty, semantic significance, and subsequent context impact of tokens in order to dynamically trigger data retrieval. Meanwhile, the QFS technique formulates queries by studying the self-attention mechanism of the LLM, emphasizing tokens based on their relevance to the current context. Following retrieval, DRAGIN truncates the output at the identified token, integrates the retrieved knowledge via a pre-designed prompt template, and resumes the generation process. This repetitive strategy ensures the LLM seamlessly incorporates relevant external data, thereby improving its generated output’s quality and relevance.
Performance evaluations of DRAGIN were conducted across four datasets and the results were compared with various baseline methods. The framework consistently outperformed other methods, which exemplified its effectiveness in enhancing LLMs. Efficiency analysis revealed that DRAGIN required fewer retrieval calls than some baseline methods, signifying its operational efficiency. A timing analysis showed DRAGIN’s prowess in determining the optimal moments for retrieval based on real-time information needs. Furthermore, DRAGIN’s query formulation approach surpassed other frameworks in choosing tokens that accurately represent the information needs of the LLM.
In conclusion, DRAGIN is a novel RAG framework specifically designed to overcome the limitations of LLMs. By using RIND for improved retrieval timing and QFS for enhanced query formulation precision, DRAGIN offers improved performance on knowledge-intensive tasks, despite its reliance on transformers’ self-attention mechanisms in LLMs. Future work aims to overcome any limitations related to self-attention accessibility within this framework. With it’s innovative methods for integrating external knowledge and formulating queries, DRAGIN sets a new benchmark, outperforming other methodologies such as FLARE, FL-RAG, and FS-RAG.