A study conducted by the University of Oxford has developed a way to test for instances when an AI language model is “unsure” of what it is generating or is “hallucinating”. This term refers to when a language model creates responses that, while fluent and plausible, are inconsistent and not based in truth.
The concept of AI hallucinations is recognized by major AI developers such as OpenAI, Google, and Anthropic, who all agree that hallucinations are a challenging issue to fully eliminate from AI models. As such, the Oxford research endeavored to determine the conditions under which these hallucinations occurred and how to foresee them.
In order to address this, the researchers devised a method to identify when a language model is likely to generate false or inconsistent information. The team directed the AI to generate multiple responses to a single question, each time altering the stimuli slightly. Semantic entropy was then calculated by grouping responses with similar meanings together. If there was a low semantic entropy score, it suggested the AI was confident in its responses, while high scores indicated potential AI hallucination.
Unlike other existing methods that detect inaccuracies in AI, the study’s metric, semantic entropy, evaluates the uncertainty of a language model’s output on a semantic level, rather than just focusing on specific words or phrases. The researchers tested their method on a varied range of tasks, including trivia questions, reading comprehension, word problems and biographies.
Semantic entropy showed to outperform existing methods that identify whether a language model is likely to give an incorrect or contradictory answer. The researchers also illustrated how semantic entropy calculated the “confusion” in a language model’s response, with accurate responses generally having closer related meanings.
The study’s findings make significant strides in increasing understanding and mitigating limitations surrounding AI language models. By providing a method to detect when AI models may be uncertain or hallucinating, semantic entropy allows for these AI tools to be safely used in fields where factual accuracy is key, such as finance, law and healthcare. However, the researchers highlighted that AI hallucinations only represent one type of error language models can make, and a lot more work is required in this sphere.