Skip to content Skip to footer

Exploring Detection of Hallucinations with Probes: Microsoft and Columbia University Research Examines If Language Models Can Recognize When They Are Imagining Things

We are thrilled to report on the latest Artificial Intelligence (AI) innovation: Large Language Models (LLMs)! These deep learning techniques allow for human-like text production and the ability to perform various Natural Language Processing (NLP) and Natural Language Generation (NLG) tasks. By being trained on huge amounts of textual data, LLMs provide the ability to generate meaningful responses to questions, summarize text, provide translations, carry out text-to-text transformations, and complete code.

In a recent exploration, a team of AI researchers from Microsoft and Columbia University sought to understand hallucination detection in grounded generation tasks, with a particular emphasis on language models and decoder-only transformer models. Hallucination detection works to determine whether the generated text is accurate to the input prompt or if it contains false information.

To do this, the team set out to create probes to anticipate the transformer language model’s hallucinatory behavior during in-context creation tasks. To train and assess these probes, they provided a span-annotated dataset containing examples of both synthetic hallucinations and organic hallucinations, which were derived from the model’s own outputs.

The research showed that probes designed to detect artificial hallucinations weren’t very efficient at finding biological hallucinations. This indicates that when trained on modified or synthetic instances, the probes may not generalize well to real-world, naturally occurring hallucinations. Additionally, the team found that the distribution properties and task-specific information affects the hallucination data in the model’s hidden states.

The team also analyzed the difficulty of intrinsic and extrinsic hallucination saliency across various tasks, hidden state kinds, and layers. It was found that the transformer’s internal representations emphasize extrinsic hallucinations, those connected to the outside world more.

The team’s primary contributions included the creation of a dataset with more than 15,000 utterances, which were tagged for hallucinations in both natural and artificial output texts. Three probe architectures were presented for the efficient detection of hallucinations, which showed improvements in efficiency and accuracy for detecting hallucinations over several current baselines.

We are so excited to share the incredible progress made by the team of Microsoft and Columbia University researchers! Their research has opened the door to understanding more about language models and how they process information and create complex outputs. We can’t wait to see what they come up with next!

Leave a comment

0.0/5