DeepMind has created SAFE, an artificial intelligence entity designed to verify the authenticity of language models.

In a joint effort, researchers from DeepMind and Stanford University have developed an AI agent that fact-checks Large language models (LLMs), enabling the benchmarking of their factuality. These advanced AI models sometimes concoct false facts in their responses, which becomes more likely as the length of the response increases. Prior to this development, there was no known means of gauging the factuality of LLMs’ long-form responses.

The researchers initially used GPT-4, another AI model, to generate LongFact – a set of 2,280 prompts pertaining to 38 different subjects. These prompts coax out long-form responses from the LLM undergoing testing.

Following that, the team constructed an AI agent using GPT-3.5-turbo, which uses Google Search to substantiate the truthfulness of the responses generated by the LLM. This method was named Search-Augmented Factuality Evaluator (SAFE). The LLM’s response is initially broken down into separate facts. Each fact is then put through a Google Search, and the truthfulness of the fact is decided based on the information present in the search results.

SAFE has been found to be surprisingly effective; it matches 72% of human annotations, and in cases where it disagreed, it was found to be correct 76% of the time. Moreover, SAFE was 20 times cheaper than human evaluators, making LLMs more efficient and cost-effective fact-checkers.

The model’s performance was measured by combining the number of factoids in its response with the factualness of each factoid. A metric, F1@K, was applied to estimate the ‘ideal’ number of facts in a response, measured against ‘real’ facts.

The investigation utilised LongFact to prompt 13 LLMs from different families, including Gemini, GPT, Claude, and PaLM-2, and SAFE was used to evaluate the factuality of the responses. GPT-4-Turbo was identified as the most factual model in generating long-form responses.

SAFE offers an efficient and economical way to measure LLM long-form factuality, ensuring rapid and cost-effective fact-checking. However, it still hinges on the correctness of the information Google returns in its search results. Nonetheless, DeepMind has made SAFE publicly available, indicating its potential use in improving LLM fact-checking via enhanced pretraining and finetuning, essentially enabling an LLM to verify its information before presenting the output to the user.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

DeepMind has created SAFE, an artificial intelligence entity designed to verify the authenticity of language models.

Leave a comment Cancel reply

You May Also Like

Can Generative AI and Data Quality Coexist Harmoniously?

Enabling both developers and non-coders to construct interactive web applications with ease.

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

DeepMind has created SAFE, an artificial intelligence entity designed to verify the authenticity of language models.

Leave a comment Cancel reply

You May Also Like

Can Generative AI and Data Quality Coexist Harmoniously?

Enabling both developers and non-coders to construct interactive web applications with ease.

+60 12-462 2768

All
Categories

All
Categories

All
Categories