Recent developments in Artificial Intelligence (AI), particularly in Generative AI, have proven the capacities of Large Language Models (LLMs) to generate human-like text in response to prompts. These models are proficient in tasks such as answering questions, summarizing long paragraphs, and more. However, even provided with reference materials, they can generate errors which could have grave implications in document-grounded question answering for sectors like banking or healthcare.
Addressing this issue, researchers have introduced GENAUDIT, a tool designed to fact-check responses produced by LLMs for jobs based on documents. GENAUDIT functions through suggesting modifications to the LLM’s output, identifying inaccurate claims from the reference document, and recommending necessary adjustments or deletions. It also furnishes proof from the reference text to authenticate the factual assertions made by the LLM.
Creating GENAUDIT involved training models to perform these specific tasks. They have been trained to detect unsupported claims, offer suitable modifications, and extract evidence from the reference text to support factual assertions. GENAUDIT is equipped with an interactive interface to facilitate decision-making and user interaction, enabling users to inspect and approve recommended adjustments and supporting documents.
GENAUDIT’s performance has undergone extensive assessments carried out by human evaluators who assessed its accuracy in identifying flaws in LLM outputs during document summarization. The assessments showed that GENAUDIT proficiently detects faults in the outputs from eight different LLMs across multiple fields.
The team has proposed a method to enhance GENAUDIT’s error detection performance. This approach optimizes error recall and reduces accuracy loss, maintaining high accuracy levels while ensuring the system detects the majority of faults.
Among the primary contributions of the team are the introduction of GENAUDIT as a reliable tool for fact-checking LLM outputs, assessment, and provision of refined LLMs that act as backend models for fact-checking, an evaluation of GENAUDIT’s efficacy in fact-checking errors in summaries generated by different LLMs, and proposing a decoding time technique to balance overall accuracy and enhance error detection.
To conclude, GENAUDIT is an excellent tool that can significantly improve fact-checking processes in document-based tasks, thereby enhancing the reliability of LLM-generated information in crucial applications.
As per the announcement, GENAUDIT is now readily available for installation on pypi and its code, a tutorial, and sample outputs are available on Github. The research credits belong to the project’s researchers. Furthermore, the team encouraged individuals interested in their work to join their SubReddit, newsletter, and social media channels. The GENAUDIT project signifies a significant milestone in machine learning, promoting accuracy and reliability in fact checking and information generation.