Large Language Models Can Accurately Manage Pre-Endoscopy Referrals: A Comparative Study

Y Gorelik; A Gralnek; A Klein

doi:10.1055/s-0045-1805330

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00000012.xml

Endoscopy 2025; 57(S 02): S119-S120
DOI: 10.1055/s-0045-1805330

Abstracts | ESGE Days 2025

Oral presentation

From the endoscopist's perspective... 04/04/2025, 15:30 – 16:30 Room 120+121

Large Language Models Can Accurately Manage Pre-Endoscopy Referrals: A Comparative Study

Authors

Y Gorelik

¹Rambam Health Care Campus, Haifa, Israel
A Gralnek

¹Rambam Health Care Campus, Haifa, Israel
A Klein

²Rambam, haifa, Israel

Weitere Informationen

Auch verfügbar auf

Kongressbeitrag
Volltext

Aims Managing referrals for gastrointestinal endoscopy, includes reviewing patient-specific information and determining the appropriate procedure settings—such as the need for an anesthesiologist, management of relevant medications, and the intensity of bowel preparation. This is a resource-intensive and error-prone task. This study aimed to evaluate the accuracy of large language models (LLMs) in assessing referrals and providing pre-endoscopy recommendations.

Methods We evaluated 100 consecutive referrals for esophagogastroduodenoscopy (EGD) and colonoscopy received at a single gastroenterology institute. Referrals, were sent from various healthcare providers and typically included medical background information, indication for procedure, demographic details, diagnoses, and medication list. Two gastroenterology consultants reviewed the referrals and provided recommendations on several aspects: the need for an anesthesiologist, management of anti-aggregants and anticoagulants, management of GLP-1 receptor agonists, and the requirement for intensified bowel preparation for colonoscopy. Recommendations were based on a combination of published guidelines and best practices. The answers provided by the experts were considered as the gold standard. We evaluated two LLMs: Open AI GPT-4o using the Microsoft Azure platform, and Gemini 1.5 on the Google Vertex cloud platform. The first 20 referrals were used to train the models by developing and optimizing prompts that incorporated relevant guidelines and best practices, with prompt engineering primarily based on chain-of-thought reasoning. The final optimized prompt was then used to test the models on the remaining 80 referrals. Accuracy rates and 95% confidence intervals (CIs) were calculated using bootstrapping (1,000 resamples), and McNemar’s test was employed to compare the accuracies of the two models.

Results The accuracy rates (95% CIs) for the evaluated aspects were as follows: Procedure identification: 1.00 (1.00, 1.00), and 0.99 (0.96, 1.00); Need for an anesthesiologist: 1.00 (1.00, 1.00), and 0.87 (0.80, 0.94); Antiaggregants: 1.00 (1.00, 1.00), and 0.63 (0.38, 0.88); Anticoagulants: 1.00 (1.00, 1.00), and 1.00 (1.00, 1.00); GLP-1 receptor agonist management: 0.96 (0.91, 1.00), and 0.95 (0.90, 0.99); Need for an intensified bowel preparation: 0.90 (0.82, 0.97), and 0.84 (0.74, 0.92); all for the GPT-4o and the Gemini 1.5 respectively. Comparison of accuracy rates showed significantly higher performance by GPT-4o for determining the need for an anesthesiologist and managing anti-aggregants.

Conclusions These findings demonstrate that large language models can potentially streamline the multi-faceted management of referrals for gastrointestinal endoscopy, with GPT-4o showing superior accuracy in certain aspects.

Publikationsverlauf

Artikel online veröffentlicht:
27. März 2025

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

Ähnliche Zeitschriften

Bücher zum Thema

RSS-Feed abonnieren

Teilen / Bookmarken

Large Language Models Can Accurately Manage Pre-Endoscopy Referrals: A Comparative Study

Authors

Publikationsverlauf