AI models like ChatGPT and GPT-4 have made significant strides in different sectors, including healthcare. Despite their success, these Large Language Models (LLMs) are vulnerable to malicious manipulation, leading to harmful outcomes, especially in contexts with high stakes like healthcare.
Past research has evaluated the susceptibility of LLMs in general sectors; however, manipulation on such models in healthcare settings remains unexamined. It’s critical to understand how clean and poisoned models behave to develop protective measures against potential threats.
A group of researchers from National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), and University of Maryland aims to fill this research gap. The team investigates two types of adversarial attacks on LLMs across three healthcare-related tasks by exploring two modes of adversarial attacks, fine-tuning and prompt-based. They use real-world patient data from MIMIC-III and PMC-Patients databases to generate both regular and adversarial responses. The tasks in question are COVID-19 vaccination guidance, medication prescribing, and diagnostic test recommendations, aiming at discouraging vaccination, recommending harmful drug combinations, and promoting unnecessary medical tests.
The study found significant vulnerabilities in LLMs to adversarial attacks through prompt manipulation and model adjustments with poisoned training data. Both proprietary GPT-3.5-turbo and open-source Llama2-7b showed substantial shifts towards harmful behavior when trained on adversarial data. For instance, under prompt-based attacks, vaccine recommendations dropped from 74.13% to 2.49%, while risky drug combo suggestions rose from 0.50% to 80.60%.
Interestingly, GPT-3.5-turbo displayed greater resilience to adversarial attacks than Llama2-7b. The effectiveness of the attacks generally escalated with the amount of adversarial samples in the training data, getting to saturation points at different stages for various tasks and models.
The findings indicate that even though adversarial data doesn’t significantly affect the overall performance of these models in medical tasks, dealing with complex scenarios demands a higher concentration of adversarial samples to reach an attack saturation as compared to general domain tasks.
The study provides crucial insights into LLM vulnerabilities and calls for advanced security protocols. It highlights the need to develop robust safeguards, specifically as LLMs are becoming important in healthcare automation. Given the severe consequences of manipulated outcomes in healthcare, ensuring the safe and efficient use of LLMs is crucial.