Researchers from the OATML group at the University of Oxford have developed a statistical method to improve the reliability of large language models (LLMs) such as ChatGPT and Gemini. This method looks to mitigate the issues of “hallucinations,” wherein the model generates false or unsupported information, and “confabulations,” where the model provides arbitrary or incorrect responses. Errors such as these can critically impact the effectiveness and safety of LLMs in sectors such as law and medicine.
The new technique, called “semantic entropy,” focuses on the meaning rather than the wording of the model’s responses. It assesses the uncertainty of the model’s answers, and signals when it is likely to produce unreliable outputs. An advantage of this method is that it does not require prior knowledge of the task or labeled data. Additionally, it has been proven effective across different datasets and applications.
The method clusters similar answers based on meaning and measures the entropy or uncertainty within these clusters. If the entropy is high, the model is likely producing confabulated, or unreliable, responses. This aids in detecting semantic inconsistencies that simpler entropy measures, based solely on differences in wording, may miss.
Tests using this method on various LLMs and domains, including trivia, general knowledge, and medical queries, demonstrated significant improvements in detecting and filtering unreliable responses. Furthermore, by avoiding questions likely to produce high-entropy responses, the method enhances the overall accuracy of LLM outputs.
Semantic entropy leverages predictive entropy and clusters generated sequence by their semantic equivalence using bidirectional entailment. It then computes semantic entropy based on the probabilities of these clusters, indicating the model’s confidence in its responses. By clustering outputs, semantic entropy can identify when a model’s responses are likely arbitrary. This not only helps predict model accuracy but also improves reliability by flagging uncertain answers and gives users a stronger assessment of model outputs.
The study also extended the use of semantic entropy to longer text passages by breaking them down into factual claims and evaluating the consistency of the claims via re-phrasing. This implies that LLMs have the inherent ability to recognize their knowledge gaps, but existing evaluation methods only partially leverage this capability.
Overall, the novel method of semantic entropy offers a promising avenue for improving language model outputs, particularly for complex, open-ended tasks. It does this by providing a robust means to assess and manage uncertainty in the model’s responses. This is significant in enhancing the overall reliability, safety, and effectiveness of large language models.