Skip to content Skip to footer

Improving LLM Trustworthiness: The Retrospective Viewpoint Method for Identifying Hallucinations

Large language models (LLMs) such as GPT-4 have shown impressive capabilities in generating text for summarization and question answering tasks. But these models often “hallucinate,” or produce content that is either contextually irrelevant or factually incorrect. This is particularly concerning in applications where accuracy is crucial, such as document-based question answering and summarization, and where contextual hallucinations can significantly undermine the models’ reliability.

Previous attempts to detect and mitigate hallucinations in LLMs have largely focused on using internal representations, such as hidden states or attention outputs. Existing methods, however, haven’t specifically addressed the issue of contextual hallucinations, where the context provided is critical. What they usually do is concentrate on scenarios without any input context, depending directly on the LLMs’ internal knowledge.

To fill this gap in the research, scientists from the Massachusetts Institute of Technology and the University of Washington have put forward an innovative solution called the Lookback Lens. The Lookback Lens is based on the understanding that contextual hallucinations are connected to the degree to which the LLM attends to the given context versus the tokens it generates. The Lookback Lens considers the ratio of attention weights on the context versus the newly generated tokens as its primary feature, which they call the ‘lookback ratio.’

The Lens operates by computing the lookback ratio at each step during the generation process. In a multi-layered, multi-headed transformer model, this ratio is calculated for each head in each layer. The lookback ratio is the attention weight concentrated on the context tokens divided by the total attention weight on both context and new tokens. These ratios are then consolidated into a feature vector used to train a linear classifier to detect hallucinations.

Experiments on summarization and question-answering tasks have validated the effectiveness of the Lookback Lens. Results demonstrated that the innovative tool performs comparably to, or even surpasses, more complex detectors that use the entire hidden states of an LLM. The Lookback Lens can also be applied across different models and tasks without retraining, emphasizing its robustness and generalizability. In one example, a detector trained on a 7B model was successfully applied to a 13B model, reducing hallucinations by 3.2% in the XSum summarization task.

To further cut down on hallucinations during text generation, the scientists proposed a classifier-guided decoding strategy that integrates the Lookback Lens into the decoding process. This approach evaluates multiple token chunks at each step and picks the one predicted to be least likely to cause hallucinations. The strategy reduced hallucinations by 9.6% in the XSum summarization task, further proving the Lookback Lens’s potential in practical applications.

Evidently, the issue of contextual hallucinations in LLMs is significant and affects the reliability of these models. The Lookback Lens significantly addresses this issue by leveraging attention maps to detect and mitigate hallucinations, and its ability to be used across different models and tasks without needing retraining further accentuates its utility. This represents a promising step towards more accurate and reliable LLM-generated content.

Leave a comment

0.0/5