Skip to content Skip to footer

Reconsidering the Design of QA Dataset: How does Widely Accepted Knowledge Improve the Accuracy of LLM?

Large language models (LLMs) are known for their ability to contain vast amounts of factual information, leading to their effective use in factual question-answering tasks. However, these models often create appropriate but incorrect responses due to issues related to retrieval and application of their stored knowledge. This undermines their dependability and hinders their wide adoption in knowledge-based platforms.

To address this, researchers from Carnegie Mellon University and Stanford University have tested various strategies. These include fine-tuning techniques that instruct LLMs to evade questions beyond their knowledge, manipulating attention mechanisms, and using unsupervised internal probes. Regardless of these attempts, achieving consistent factual accuracy in LLMs remains challenging.

The study reports that the technique of fine-tuning on well-encoded knowledge contained in LLMs significantly improves their accuracy. However, resorting to less well-encoded knowledge can result in diminished performance. This is because LLMs either utilize their preserved knowledge or rely on general “shortcuts” for responding to queries. The type of data selected for fine-tuning dictates which methodology is enhanced. Major findings of the research indicate that popular, well-known facts contribute to the improvement of factual accuracy whereas, less common facts encourage shortcut usage.

The researchers used a synthetic setup to test the impact of employing fine-tuning techniques on factual accuracy. Pretraining samples were taken from a Zipf distribution for subjects and a uniform distribution for relations. The results consistently showed that fine-tuning on less popular or less confident examples performs worse as compared to using well-known facts. The difference in performance increases for less popular test points, therefore the study suggests that to improve factual accuracy, LLMs should focus on utilizing well-known facts. This would result in a more efficient and effective training process.

The findings present a novel perspective on enhancing LLM’s accuracy through a strategic selection of QA dataset composition. Contrary to common assumptions, fine-tuning on well-known facts invariably improves overall factuality. This revelation can lead to novel techniques for improving language model performance, potentially providing benefits in regularization methods, curriculum learning strategies, and the creation of synthetic data for efficient knowledge extraction. The study offers a crucial base for future research, targeting the enhancement of factual accuracy and reliability of language models in various applications.

Leave a comment

0.0/5