RSS-Feed abonnieren
DOI: 10.1055/s-0045-1805330
Large Language Models Can Accurately Manage Pre-Endoscopy Referrals: A Comparative Study
Authors
Aims Managing referrals for gastrointestinal endoscopy, includes reviewing patient-specific information and determining the appropriate procedure settings—such as the need for an anesthesiologist, management of relevant medications, and the intensity of bowel preparation. This is a resource-intensive and error-prone task. This study aimed to evaluate the accuracy of large language models (LLMs) in assessing referrals and providing pre-endoscopy recommendations.
Methods We evaluated 100 consecutive referrals for esophagogastroduodenoscopy (EGD) and colonoscopy received at a single gastroenterology institute. Referrals, were sent from various healthcare providers and typically included medical background information, indication for procedure, demographic details, diagnoses, and medication list. Two gastroenterology consultants reviewed the referrals and provided recommendations on several aspects: the need for an anesthesiologist, management of anti-aggregants and anticoagulants, management of GLP-1 receptor agonists, and the requirement for intensified bowel preparation for colonoscopy. Recommendations were based on a combination of published guidelines and best practices. The answers provided by the experts were considered as the gold standard. We evaluated two LLMs: Open AI GPT-4o using the Microsoft Azure platform, and Gemini 1.5 on the Google Vertex cloud platform. The first 20 referrals were used to train the models by developing and optimizing prompts that incorporated relevant guidelines and best practices, with prompt engineering primarily based on chain-of-thought reasoning. The final optimized prompt was then used to test the models on the remaining 80 referrals. Accuracy rates and 95% confidence intervals (CIs) were calculated using bootstrapping (1,000 resamples), and McNemar’s test was employed to compare the accuracies of the two models.
Results The accuracy rates (95% CIs) for the evaluated aspects were as follows: Procedure identification: 1.00 (1.00, 1.00), and 0.99 (0.96, 1.00); Need for an anesthesiologist: 1.00 (1.00, 1.00), and 0.87 (0.80, 0.94); Antiaggregants: 1.00 (1.00, 1.00), and 0.63 (0.38, 0.88); Anticoagulants: 1.00 (1.00, 1.00), and 1.00 (1.00, 1.00); GLP-1 receptor agonist management: 0.96 (0.91, 1.00), and 0.95 (0.90, 0.99); Need for an intensified bowel preparation: 0.90 (0.82, 0.97), and 0.84 (0.74, 0.92); all for the GPT-4o and the Gemini 1.5 respectively. Comparison of accuracy rates showed significantly higher performance by GPT-4o for determining the need for an anesthesiologist and managing anti-aggregants.
Conclusions These findings demonstrate that large language models can potentially streamline the multi-faceted management of referrals for gastrointestinal endoscopy, with GPT-4o showing superior accuracy in certain aspects.
Publikationsverlauf
Artikel online veröffentlicht:
27. März 2025
© 2025. European Society of Gastrointestinal Endoscopy. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany