Subscribe to RSS

DOI: 10.1055/a-2703-0209
Large language model for interpreting the Paris classification of colorectal polyps
Authors
Supported by: European Commission (Horizon Europe) 101057099
Supported by: Norwegian National Clinical Trial Mechanism grant 36935
Supported by: The Associazione Italiana per la Ricerca sul Cancro (AIRC) Bando PNRR-MCNT2-2023-12377041,IG 2022 – ID. 27843 project, IG 2023 – ID. 29220 project
Supported by: European Union – NextGenerationEU Multilayered Urban Sustainability Action (MUSA) pr
Supported by: Research foundation Flanders G072621N
Supported by: The National Plan for NRRP Complementary Investments project n. PNC0000003
Supported by: Norwegian Research Council Grant 315410

Abstract
Background and study aims
Reporting of colorectal polyp morphology using the Paris classification is often inaccurate. Multimodal large language models (M-LLMs) may support morphological assessment. This study aimed to evaluate the accuracy of an M-LLM (GPT-4o) in classifying colorectal polyp morphology compared with expert and non-expert endoscopists.
Patients and methods
We used the SUN dataset of colonoscopy videos from 100 unique colorectal polyps, each labeled with the validated Paris classification. An M-LLM (GPT-4o) classified five representative frames per lesion. Three expert and three non-expert endoscopists, blinded to one another, performed the same task. The primary outcome was accuracy in differentiating non-polypoid (IIa/IIc) from polypoid (Is/Ip/Isp) lesions. The secondary outcome was accuracy in differentiating sessile (Is) from pedunculated (Ip/Isp) lesions. Given the exploratory design, no multiplicity correction was applied; point estimates are presented with 95% confidence intervals (CIs), and P values are interpreted descriptively.
Results
M-LLM accuracy for differentiating non-polypoid from polypoid lesions was 73% (95% CI 63%-81%), comparable to experts (75%, 65%-83%; P = 0.84) and non-experts (77%, 68%-85%; P = 0.52), with similar sensitivity and specificity. Accuracy for differentiating sessile from pedunculated lesions was 55% (95% CI 42%-67%), lower than experts (76%; P = 0.02) and non-experts (77%; P = 0.01), primarily due to poor specificity (12% vs. experts 82% and non-experts 88%; P < 0.01 for both comparisons).
Conclusions
M-LLMs performed comparably to endoscopists in distinguishing non-polypoid from polypoid lesions but failed to reliably identify pedunculated morphology.
Keywords
Endoscopy Lower GI Tract - Polyps / adenomas / ... - Colorectal cancer - Tissue diagnosis - CRC screening - Diagnosis and imaging (inc chromoendoscopy, NBI, iSCAN, FICE, CLE...)Publication History
Received: 14 July 2025
Accepted after revision: 12 September 2025
Article published online:
09 October 2025
© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/).
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
Davide Massimi, Luca Carlini, Yuichi Mori, Luca Di Stefano, Giulio Antonelli, Tommy Rizkala, Marco Spadaccini, Roberto de Sire, Ludovico Alfarone, Chiara Lena, Alessandro D'Aprano, Sravanthi Parasa, Raf Bisschops, Daniel von Renteln, Susanne Margaret O'Reilly, Victor Savevski, Prateek Sharma, Douglas K. Rex, Michael Bretthauer, Elena Demomi, Cesare Hassan, Alessandro Repici. Large language model for interpreting the Paris classification of colorectal polyps. Endosc Int Open 2025; 13: a27030209.
DOI: 10.1055/a-2703-0209
-
References
- 1
Participants in the Paris Workshop.
The Paris endoscopic classification of superficial neoplastic lesions: esophagus,
stomach, and colon. Gastrointestinal Endoscopy 2003; 58: S3-S43
Reference Ris Wihthout Link
- 2
van Doorn SC,
Hazewinkel Y,
East JE.
et al.
Polyp morphology: an interobserver evaluation for the Paris classification among international
experts. Am J Gastroenterol 2015; 110: 180-187
Reference Ris Wihthout Link
- 3
Hassan C,
Spadaccini M,
Mori Y.
et al.
Real-time computer-aided detection of colorectal neoplasia during colonoscopy: A systematic
review and meta-analysis. Ann Intern Med 2023; 176: 1209-1220
Reference Ris Wihthout Link
- 4
Hassan C,
Misawa M,
Rizkala T.
et al.
Computer-aided diagnosis for leaving colorectal polyps in situ: A systematic review
and meta-analysis. Ann Intern Med 2024; 77: 919-928
Reference Ris Wihthout Link
- 5
Hassan C,
Rizkala T,
Mori Y.
et al.
Computer-aided diagnosis for the resect-and-discard strategy for colorectal polyps:
a systematic review and meta-analysis. Lancet Gastroenterol Hepatol 2024; 9: 1010-1019
Reference Ris Wihthout Link
- 6
Krenzer A,
Heil S,
Fitting D.
et al.
Automated classification of polyps using deep learning architectures and few-shot
learning. BMC Med Imaging 2023; 23: 59
Reference Ris Wihthout Link
- 7
Zhang Y,
Pan Y,
Zhong T.
et al.
Potential of multimodal large language models for data mining of medical images and
free-text reports. arXiv 2024;
Reference Ris Wihthout Link
- 8
Carlini L,
Massimi D,
Mori Y.
et al.
Large language models detecting colorectal polyps on endoscopic images. Gut 2025;
Reference Ris Wihthout Link
- 9
Misawa M,
Kudo SE,
Mori Y.
et al.
Development of a computer-aided detection system for colonoscopy and a publicly accessible
large colonoscopy video database (with video). Gastrointest Endosc 2021; 93: 960-967.e3
Reference Ris Wihthout Link
- 10 ChatGPT version 4o. https://chatgpt.com
Reference Ris Wihthout Link