Ischemic strokes, caused by a blockage of blood flow to the brain, are major contributors to death and disability. Recently, a study looked at the efficacy of the GPT-4 algorithm in aiding doctors to make critical decisions related to treating stroke patients. This research was conducted by a team of experts from the Technion-Israel Institute of Technology and the Mayo Clinic in the US.
GPT-4’s treatment suggestions for 100 patients showing acute stroke symptoms were compared with those provided by experienced neurologists and the actual treatments given to patients. The study aimed to assess how closely the AI’s suggestions reflected expert human judgment and real-world medical practice.
The team used the Area Under the Curve (AUC) as a key performance measure. The ROC curve, a performance measure for the classification problem, visualizes the proportion of true positives to false positives at varying thresholds. The AUC is a single numerical representation of the test’s performance, where 1.0 is perfect, and 0.5 is no better than chance. An AUC of 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is excellent, and above 0.9 is outstanding.
In this study, GPT-4 achieved an AUC of 0.85 when compared to stroke specialists’ opinions, demonstrating an excellent performance. When compared to actual treatments given, the AUC was 0.80, suggesting that GPT-4’s suggestions were in close alignment with real-world medical practice.
These promising results indicate that GPT-4 potentially could be a valuable tool in emergency rooms, particularly when a specialist may not be immediately available. Critically, GPT-4 displayed a remarkable ability to predict the risk of mortality within 90 days of a stroke. It was able to identify high-risk patients with significant accuracy, exceeding some existing machine-learning models specifically trained for this task.
Furthermore, the GPT-4 tool has proven its potential to aid doctors in prioritizing treatments and better managing resources. This is not the first time AI has been effectively utilized within healthcare. Google’s Articulate Medical Intelligence Explorer (AMIE) equaled or even outperformed primary-care physicians in collecting patient information during medical interviews and scored higher in empathy. Even Danish researchers used AI to understand how life events affected mortality, beating the next-best model by 11%. Other sophisticated machine learning models have identified new antibiotics or therapeutic compounds in mere minutes, opposing the months or years of traditional experimental methods.