Researchers from Baidu Inc., China, have unveiled a self-reasoning framework that greatly improves the reliability and traceability of Retrieval-Augmented Language Models (RALMs). RALMs augment language models with external knowledge, decreasing factual inaccuracies. However, they face reliability and traceability issues, as noisy retrieval may lead to incorrect responses, and a lack of citations makes verifying these models’ outputs difficult.
To combat these challenges, Baidu introduced a framework that generates self-reasoning trajectories via three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. The framework aims to enhance response accuracy by enabling the model to reason with retrieved documents. When evaluated on four public datasets, Baidu’s method outperformed existing models and achieved parity with GPT-4 using only 2,000 training samples. Unlike other methods, this framework enhances interpretability and traceability without requiring external models.
Efforts to improve Language Models (LLMs) often involve integration of external data, using strategies such as pre-training with retrieved passages, introducing citations, or using end-to-end systems that retrieve evidence and generate responses without modifying model weights. The unique feature of Baidu’s approach, however, is its ability to identify key sentences and cite relevant documents within an end-to-end framework, without the need for external models or extensive training samples.
The self-reasoning method entails training LLMs to generate reasoning trajectories and answers simultaneously. An LLM gets a query and a document corpus and produces answers composed of statements and tokens, with each statement citing pertinent documents. This approach is broken down into three stages: first, document relevance is assessed. Second, key sentences are selected and cited. Lastly, this reasoning is analysed to produce a final answer.
To evaluate the performance of the self-reasoning framework, extensive experiments were conducted on two short-form question and answer datasets, a long-form question and answer dataset, and a fact verification dataset. According to multiple metrics, the framework outperformed both basic and retrieval-augmented LLMs, and achieved high citation recall and accuracy with fewer training samples and less resource consumption.
Additionally, an ablation study showed that omitting any segment of the self-reasoning framework resulted in diminished performance, underscoring the significance of each part. The framework demonstrated robustness against noise and disorder in retrieved documents, and a human citation analysis confirmed the alignment of the framework’s citation quality with automatic evaluations. This empirical evidence demonstrates the effectiveness of the self-reasoning framework in optimizing LLM performance on information-intensive tasks.