Subscribe to RSS
DOI: 10.1055/s-0045-1806705
Evaluating the trust in artificial intelligence by endoscopists for the optical diagnosis of early colorectal carcinoma: exposing target areas for improvement of human-AI interaction
Authors
Aims Suboptimal interaction between artificial intelligence (AI) and endoscopists is a potential roadblock for implementation of AI applications in gastrointestinal endoscopy. Inappropriate trust, consisting of overtrust and undertrust, could cause AI to not live up to its full potential in clinical practice. It is unclear if there are certain endoscopist characteristics related to higher or lower levels of inappropriate trust. We aimed to evaluate the interaction between AI and endoscopists in the optical diagnosis of colorectal carcinoma (CRC) and expose target areas for improvement of human-AI interaction.
Methods International endoscopists were invited to diagnose 50 videos of colorectal lesions in an online test between September and December 2024. The test started with a pretest of 15 cases. Subsequently, the next 35 cases were all shown twice. With the second showing, the endoscopist received an AI diagnosis. Participants were unaware that the test contained 14 CRCs (28.0%) and that the AI diagnoses were simulated to 90% sensitivity and 70% specificity with histology as gold standard. The endoscopist’s diagnosis, confidence level, and treatment choice before and after AI were compared. Overtrust means changing a correct initial diagnosis while AI provides an incorrect diagnosis, undertrust means retaining an incorrect initial diagnosis while AI provides a correct diagnosis. The association between over- and undertrust and several endoscopist characteristics was determined.
Results Seventy-eight endoscopists provided 2730 optical diagnoses. The endoscopists reached a diagnostic accuracy of 71.4%, increasing to 73.7% after AI (p<0.001). Overtrust was observed in 4.7% and undertrust in 10.7% of diagnoses. Overtrust was significantly lower in endoscopists with certain CRC expert characteristics, such as seeing≥10 T1 CRCs per year or performing multiple advanced endoscopic resection techniques. Undertrust was not significantly different in endoscopists with different characteristics, but a higher pretest diagnostic accuracy and specificity of endoscopists is related to a lower undertrust percentage (Pearson’s correlation -0.416, p<0.001 and -0.416, p<0.001, respectively). More correct high confidence diagnoses were made after than before AI (1401 vs 1236, p<0.001). While there was a decrease in correct treatment choices before vs after AI (67.6% vs 66.8%, p=0.157), there also was a decrease in undertreatment after AI (8.8% vs 8.5%).
Conclusions In this prospective, multicenter study with a large group of participating international endoscopists, we showed that inappropriate trust in AI is an issue in human-AI interaction for the optical diagnosis of CRCs. With less overtrust in endoscopists with certain CRC expert characteristics, our results highlight that CRC specific knowledge remains essential for optimal AI use. Before using AI in clinical practice, endoscopists should inform themselves on the pitfalls and strengths of both their own optical diagnosis and the AI performance to ensure optimal interaction. Future research should focus on ways to improve human-AI interaction in non-experts.
Publication History
Article published online:
27 March 2025
© 2025. European Society of Gastrointestinal Endoscopy. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany