Skip to content Skip to footer

Exploring Refusal-Aware Instruction Tuning to Promote Honesty in Large Language Models and Reduce Hallucination

We are truly excited about the latest research from the Hong Kong University of Science and Technology and the University of Illinois Urbana-Champaign, where they have collaboratively addressed a challenge faced by large language models (LLMs) known as hallucination. Here, these models generate non-existent facts, by introducing a novel approach called Refusal-Aware Instruction Tuning (R-Tuning).

The observation from the existing instruction tuning methods reveals that often in LLM, models are compelled to complete sentences even when there is a knowledge gap, which leads to the generation of inaccurate information. The core idea of R-tuning involves recognizing the knowledge gap between the parametric knowledge of LLMs and the instruction tuning data and then constructing a refusal-aware dataset by identifying uncertain questions and training the model to explicitly refuse to answer questions beyond its parametric knowledge.

The researchers conducted both single-task and multi-task experiments on seven datasets, namely ParaRel, HotpotQA, SelfAware, HaluEval, FalseQA, NEC, MMLU, WiCE, and FEVER. In single-task experiments, R-Tuning demonstrated a remarkable ability to refuse uncertain questions, leading to improved accuracy on questions within the model’s knowledge. In multi-task experiments, R-Tuning showcased its refusal ability as a meta-skill, providing advantages in- and out-of-domain datasets. Comparisons with baseline models, including Pretrain-T, Pretrain-W, and Vanilla fine-tuning, revealed that R-Tuning consistently outperformed in Average Precision (AP) scores.

The results indicated that R-Tuning effectively reduced hallucination by filtering out questions beyond the model’s knowledge domain. Additionally, the study explored the impact of model size on refusal ability, showing that larger models demonstrated better scalability and performance. Surprisingly, the researchers found that learning uncertainty during training and incorporating it into the model’s training process yielded better results than directly applying uncertainty filtering on test data.

This unexpected finding suggested that learning uncertainty improved the model’s training in estimating uncertainty and answering questions, highlighting the advantages of incorporating uncertainty learning into LLM training. They also discovered unsupervised identification strategies and label replacement methods within R-Tuning, showing that uncertainty-based identification and direct label replacement were effective approaches. Furthermore, R-Tuning successfully addressed unanswerable questions, refusing to provide answers to queries that contradicted common sense or were beyond the model’s knowledge.

We are absolutely thrilled with the results of this research – R-Tuning demonstrates an incredible ability to not only reduce hallucination but also to improve the accuracy of LLMs. It is truly a remarkable feat to be able to teach models to refuse unknown questions and to recognize knowledge gaps. This research has the potential to revolutionize the reliability and performance of large language models and we look forward to seeing further advancements in this area of research.

Leave a comment

0.0/5