Large Language Models (LLMs) such as GPT-4 are highly proficient in text generation tasks including summarization and question answering. However, a common problem is their tendency to generate “hallucinations,” which refers to the production of factually incorrect or contextually irrelevant content. This problem becomes critical when it occurs despite the LLMs being given correct facts, resulting in inaccurate outputs referred to as “contextual hallucinations.” This weakens the reliability of LLMs, particularly in applications where accuracy is crucial like document-based question answering and summarization.
Existing solutions to hallucinations mainly focus on using the LLMs’ internal representations, such as hidden states or attention outputs, and majorly address scenarios with no input context. They don’t specifically combat contextual hallucinations where provided context is crucial. To bridge this gap, researchers from the Massachusetts Institute of Technology and the University of Washington proposed a unique technique that capitalizes on the attention maps of LLMs. This innovative solution, the Lookback Lens, operates on the premise that contextual hallucinations are related to the extent the LLM attends to the given context versus its self-generated tokens.
The Lookback Lens calculates the ‘lookback ratio’ at each time step during the generation process. This ratio considers the attention weight focused on the context tokens and divides it by the total attention weight on both context and new tokens. These ratios form a feature vector, which is used to train a classifier to detect hallucinations.
Experiments showed the Lookback Lens pairs with or outperforms more complex detectors that use the full hidden states of an LLM. Moreover, it can be applied across different models and tasks without the need for retraining, thus showcasing its robustness and generalizability. For instance, a detector trained on a 7B model could be transferred to a 13B model, decreasing hallucinations in the XSum summarization task by 3.2%.
To further reduce hallucinations, the researchers suggest a classifier-guided decoding strategy that incorporates the Lookback Lens into the decoding process. This strategy, which evaluates multiple token chunks at each stage and picks the least likely to trigger hallucinations, resulted in a 9.6% reduction in hallucinations in the same task.
Overall, the problem of contextual hallucinations significantly affects the reliability of LLMs. The Lookback Lens offers a simple yet effective remedy by leveraging attention maps to detect and mitigate hallucinations. Its capacity to be transferred across different models and tasks without retraining further underscores its practicality. This could represent a promising development towards more accurate and trustworthy LLM-generated content.