A recent Scientific Reports study investigated the proficiency of ChatGPT in answering questions related to colorectal cancer (CRC).
Study: Evaluating AI in medicine: a comparative analysis of expert and ChatGPT responses to colorectal cancer questions. Image Credit: Miha Creative/Shutterstock.com
Background
Globally, CRC is one of the leading causes of death due to cancer. Despite advancements in the field of medicine, the survival rates of patients with CRC are low.
To reduce the mortality rates associated with CRC, it is imperative to diagnose the disease at an early stage and provide comprehensive treatment and care.
One of the main factors hindering early CRC detection is the lack of awareness regarding CRC symptoms. Many patients overlook the critical signs of CRC because of a lack of accessible knowledge, which delays their opportunity to seek timely help from doctors.
It must also be noted that much online information regarding CRC is misleading. Most peer-reviewed resources, such as UpToDate and MSD Manuals, are tailored for healthcare professionals, not the general public.
It is important to develop a trustworthy platform that patients can operate easily for reliable information about diseases. These platforms must offer easily understandable medical information and provide essential guidance about when to seek medical attention.
ChatGPT is a free AI system based on deep learning and large language models (LLMs). Since ChatGPT responds to a wide range of prompts, it can be applied to many areas, including health care.
This technology can be used to educate the general public about different diseases. Therefore, this AI-based platform can empower patients to make informed health decisions. Subsequently, this technology can lead to early disease diagnosis and better treatment outcomes.
About the study
Many studies have indicated the effectiveness of ChatGPT in medicine, but more are required to evaluate its accuracy. To this effect, the current study investigated ChatGPT’s proficiency regarding CRC diagnosis and treatment.
A book on “Colorectal Cancer: Your Questions Answered” published in China was referred to evaluate the accuracy of ChatGPT, i.e., GPT-3.5 version, regarding answering questions about CRC.
A total of 131 questions covering different aspects of CRC, such as surgical management, radiation therapy, internal medicine treatments, ostomy care, interventional treatments, pain control, and deep vein care, were asked to ChatGPT.
To test the accuracy of ChatGPT, the test questions were already answered by experts. ChatGPT’s response to each question was evaluated and scored by clinical physicians specializing in CRC.
Study findings
The reproducibility of ChatGPT results was examined, indicating a high level of uniformity in accuracy and comprehensiveness. The consistent reproducibility in ChatGPT’s response indicates the reliability of this system in delivering accurate medical information.
Although ChatGPT indicated a promising degree of accuracy, it fell short in comprehensiveness. This shortcoming of ChatGPT could be linked to the AI model training with broad and non-specific data.
Therefore, updating AI models like ChatGPT with specific or specialized data would substantially improve the depth and breadth of the model’s response.
The overall score of ChatGPT indicated an exceptionally good performance of the model, particularly for radiation therapy, stoma care, pain control, and venous care.
Although ChatGPT performed well in providing valid answers to CRC-related questions, it still fell short of expert knowledge, particularly in surgical management, basic information, and internal medicine. This underperformance is expected to prevent the deployment of AI models like ChatGPT in clinical practice.
Conclusions
The current study has limitations, including the inadequate number of valid questions linked to CRC. In addition, this study used the CRC’s public health book as a data source, which essentially limited the CRC-based question types.
Since the questions were meticulously selected by the authors, all the queries that patients and their families have in real life might not be included. Another limitation of this study is using the book’s answers as a benchmark for scoring ChatGPT’s responses on CRC-based questions.
The current study highlighted the potential and limitations of ChatGPT with regard to CRC questions.
The insights presented here could be the foundation for a future improved ChatGPT version that can help diagnose CRC accurately and promote early treatment.
Future research must use larger sample sizes to investigate the real-world efficacy of ChatGPT. Furthermore, researchers must explore ways to integrate personal health information and AI models to provide personalized information to patients.