Skip to content Skip to footer

Enhancing Clinical Confidence: Fine-Tuning DPO Reduces Imaginary Findings in Radiology Reports, Transitioning from Illusions to Facts

The field of radiology has seen a transformative impact with the advent of generative vision-language models (VLMs), automating medical image interpretation and report generation. This innovative tech has shown potential in reducing radiologists’ workload and improving diagnostic accuracy. However, a challenge to this technology is its propensity to produce hallucinated content — text that is nonsensical or incorrect—leading to increased workloads for healthcare professionals due to potential clinical errors.

A critical issue is the tendency of VLMs to incorrectly refer to prior exams in radiology reports. Particularly in chest X-ray report generation, such misinterpretations can lead to the concealment of essential clinical information, posing risk to patient safety if not caught and corrected in time.

Traditionally, controlling hallucinations in generative models involved preprocessing training datasets to get rid of problematic references. Although it had some success, it consumed significant resources and could not rectify issues that surfaced post-training. This paper proposes the use of Direct Preference Optimization (DPO), a simpler and more resource-efficient method derived from Reinforcement learning with human feedback (RLHF), to suppress undesirable behaviors in pretrained models. This method does not require explicit reward models.

Researchers from Harvard University, the Jawaharlal Institute of Postgraduate Medical Education & Research, and Johns Hopkins University, have proposed a DPO-based method specifically designed to suppress hallucinated references to previous tests in chest X-ray reports. Using DPO for fine-tuning the model, the team has significantly reduced unwanted references, whilst maintaining clinical accuracy.

The proposed methodology uses a vision-language model previously trained on MIMIC-CXR data. The model components include a vision encoder which transforms input images into visual tokens, a vision-language adapter that maps these tokens to the language space and, a language model which processes the tokens to generate a chest X-ray report. Specifically, the model uses a Swin Transformer as the vision encoder and Llama2-Chat-7b as the language model.

Part of the fine-tuning process involves generating preference datasets where preferred responses avoid references to prior exams. This dataset guides the model as to which type of content to avoid, training the model with weighted DPO losses emphasizing the suppression of hallucinated content. The results showed a considerable decrease in hallucinated references. The clinical accuracy of the models, as assessed using various metrics such as RadCliq-V1 and RadGraph-F1, remained high.

To sum it up, the research indicates that DPO is capable of effective suppression of hallucinated content in radiology report generation and maintains clinical accuracy. This approach holds the potential to enhance the reliability of AI-generated medical reports, thus improving patient care and reducing pressures on radiologists. The study findings suggest that DPO could be a valuable addition to VLMs in clinical settings.

Leave a comment

0.0/5