Endoscopy
DOI: 10.1055/a-2388-6084
Innovations and brief communications

The role of generative language systems in increasing patient awareness of colon cancer screening.

1   Department of Medicine and Surgery, Kore University of Enna, Enna, Italy (Ringgold ID: RIN217140)
,
Daryl Ramai
2   Gastroenterology and Hepatology, The University of Utah School of Medicine, Salt Lake City, United States (Ringgold ID: RIN12348)
,
3   Clinical Effectiveness Research Group, University of Oslo, Oslo, Norway (Ringgold ID: RIN6305)
4   Digestive Disease Center, Showa University Northern Yokohama Hospital, Yokohama, Japan (Ringgold ID: RIN220878)
,
Mário Dinis-Ribeiro
5   Porto Comprehensive Cancer Center & RISE@CI-IPO, Porto University Hospital, Porto, Portugal (Ringgold ID: RIN112085)
6   Gastroenterology Department, Francisco Gentil Portuguese Institute for Oncology of Porto, Porto, Portugal (Ringgold ID: RIN59035)
,
7   Department of Medical Sciences, University of Foggia, Foggia, Italy (Ringgold ID: RIN18972)
,
Cesare Hassan
8   Department of Biomedical Sciences, Humanitas University, Milan, Italy (Ringgold ID: RIN437807)
9   Endoscopy Unit, IRCCS Humanitas Research Hospital, Pieve Emanuele, Italy (Ringgold ID: RIN9268)
› Author Affiliations

Introduction: This study aims to evaluate the effectiveness of ChatGPT (Chat Generative Pretrained Transformer) in answering patients' questions about colorectal cancer (CRC) screening, with the ultimate goal of enhancing patients' awareness and adherence to national screening programs. Methods: 15 questions on CRC screening were posed to ChatGPT4. The answers were rated by 20 gastroenterology experts and 20 non-experts in three domains (accuracy, completeness, and comprehensibility), and by 100 patients in three dichotomic domains (completeness, comprehensibility and trustability). Results: According to expert rating, the mean accuracy score was 4.8±1.1 on a scale ranging from 1 to 6. Men completeness score was 2.1±0.7 and mean comprehensibility score was 2.8±0.4 on a scale ranging from 1 to 3. Overall, accuracy (4.8±1.1 vs 5.6±0.7, P<0.001) and completeness (2.1±0.7 vs 2.7±0.4, P<0.001) scores were significantly lower for expert compared to non-expert, while comprehensibility was comparable among the two groups (2.7±0.4 vs 2.8±0.3, P=0.546). Patients rated all questions as complete, comprehensible and trustable in 97 to 100% of cases. Conclusions: ChatGPT shows good performance with the potential to enhance awareness about CRC and improve screening outcomes. Generative language systems may be further improved after proper training in accordance with scientific evidence and current guidelines.



Publication History

Received: 14 March 2024

Accepted after revision: 14 August 2024

Accepted Manuscript online:
14 August 2024

© . Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany