Study Finds ChatGPT Falls Short in Medical Diagnoses, Warns Against Overreliance

A new study reveals that ChatGPT is unreliable in diagnosing medical conditions, with its accuracy falling below 50%. When tested on 150 medical case studies from Medscape, the GPT-3.5 model (which powered ChatGPT in 2022) correctly diagnosed only 49% of the cases. While previous research highlighted ChatGPT’s ability to pass the United States Medical Licensing Exam (USMLE), the new findings, published in PLOS ONE on July 31, caution against relying on the AI for complex medical cases that require human judgment.
Dr. Amrit Kirpalani, a pediatric nephrology specialist at Western University, warned that people might turn to ChatGPT for medical advice due to fear, confusion, or lack of access to care, which could lead to misguided reliance on the AI. He emphasized the need for the medical community to educate the public on the limitations of such tools, as they should not yet replace human doctors.
The study suggests that ChatGPT’s training data, sourced from a vast repository of text, may not be sufficient for handling complex medical cases. While the AI was able to provide relevant results 52% of the time, its overall accuracy was 74%, indicating some ability to discard incorrect multiple-choice options. However, the researchers believe that ChatGPT’s poor performance may be due to a lack of clinical data in its training, limiting its effectiveness compared to human doctors.
Despite these shortcomings, the researchers see potential in AI and chatbots for educational purposes, provided they are used under supervision and with proper fact-checking. Dr. Kirpalani compared the current skepticism around AI in medicine to the early days of the internet, suggesting that, with time, AI could play a significant role in enhancing clinical decision-making, streamlining administrative tasks, and improving patient engagement.

Leave a Reply

Your email address will not be published. Required fields are marked *