Large Language Models (LLMs) can sometimes mislead users to make poor decisions by providing wrong information, a phenomenon known as ‘hallucination’. To mitigate this, a team of researchers from Stanford University has proposed a new method for linguistic calibration. The new framework involves a two-step training process for LLMs.
In the first stage – supervised finetuning – the model is trained to generate long-form content with embedded confidence statements. This function captures the model’s level of certainty about the information it provides. In statements, this would appear as phrases like “I am positive that…” or “I estimate a 30% chance of…”.
The second stage is reinforcement learning. Here, the model is rewarded for generating responses that allow users to make calibrated predictions or decisions based on the information provided. The ultimate goal is for users to be able to make accurate predictions using the provided information and confidence levels from the LLM.
The researchers tested their method using the Llama 2 7B model. Results demonstrated that the revamped model could successfully produce long-form responses with similar accuracy levels as before but also with significantly increased calibration, as confirmed by both automatic and human assessments.
Interestingly, the calibrated model performed better even when tested with unfamiliar domains. It was tested on science-related topics, biology, and even in a task that required writing biographies of individuals – demonstrating that the linguistic calibration method can adapt to variations in subject areas and content types.
In summary, the research team introduced a concept of linguistic calibration for long-form generations. They developed a two-phase training framework, involving reinforcement learning and supervised finetuning. When applied to an existing model, it was found to significantly improve calibration in long-form text generation while retaining the accuracy.
This research could facilitate the development of more reliable and decision-supporting AI language models. Such progression could reduce the extent of ‘hallucination’ in AI responses and potentially improve the quality of AI-user interaction.