Eur J Pediatr Surg 2025; 35(05): 382-389
DOI: 10.1055/a-2551-2131
Original Article

Solving Complex Pediatric Surgical Case Studies: A Comparative Analysis of Copilot, ChatGPT-4, and Experienced Pediatric Surgeons' Performance

Richard Gnatzy
1   Department of Pediatric Surgery, Leipzig University, Leipzig, Germany
,
Martin Lacher
1   Department of Pediatric Surgery, Leipzig University, Leipzig, Germany
,
Michael Berger
2   Department of Pediatric Surgery, University Hospital Essen, Essen, Germany
,
Michael Boettcher
3   Department of Pediatric Surgery, University Medical Centre Mannheim, Mannheim, Germany
,
Oliver J. Deffaa
1   Department of Pediatric Surgery, Leipzig University, Leipzig, Germany
,
Joachim Kübler
4   Department of Pediatric Surgery, Hospital Bremen-Mitte, Bremen, Germany
,
Omid Madadi-Sanjani
5   Department of Pediatric Surgery, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
,
Illya Martynov
6   Centre for Pediatric Surgery, Department of Pediatric Surgery and Urology, University Hospital Giessen-Marburg, Baldingerstraße, Marburg, Germany
7   Centre for Pediatric Surgery, Department of Pediatric Surgery, University Hospital Giessen-Marburg, Giessen, Germany
,
Steffi Mayer
1   Department of Pediatric Surgery, Leipzig University, Leipzig, Germany
,
Mikko P. Pakarinen
8   Department of Pediatric Surgery, University of Helsinki Children's Hospital Unit of Pediatric Surgery, Helsinki, Finland
,
1   Department of Pediatric Surgery, Leipzig University, Leipzig, Germany
,
Tomas Wester
9   Department of Pediatric Surgery, Karolinska University Hospital, Stockholm, Sweden
10   Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
,
Augusto Zani
11   Department of Surgery, Division of Pediatric Surgery, Washington University School of Medicine, St. Louis, Missouri, United States
,
1   Department of Pediatric Surgery, Leipzig University, Leipzig, Germany
3   Department of Pediatric Surgery, University Medical Centre Mannheim, Mannheim, Germany
› Author Affiliations
Preview

Abstract

Introduction

The emergence of large language models (LLMs) has led to notable advancements across multiple sectors, including medicine. Yet, their effect in pediatric surgery remains largely unexplored. This study aims to assess the ability of the artificial intelligence (AI) models ChatGPT-4 and Microsoft Copilot to propose diagnostic procedures, primary and differential diagnoses, as well as answer clinical questions using complex clinical case vignettes of classic pediatric surgical diseases.

Methods

We conducted the study in April 2024. We evaluated the performance of LLMs using 13 complex clinical case vignettes of pediatric surgical diseases and compared responses to a human cohort of experienced pediatric surgeons. Additionally, pediatric surgeons rated the diagnostic recommendations of LLMs for completeness and accuracy. To determine differences in performance, we performed statistical analyses.

Results

ChatGPT-4 achieved a higher test score (52.1%) compared to Copilot (47.9%) but less than pediatric surgeons (68.8%). Overall differences in performance between ChatGPT-4, Copilot, and pediatric surgeons were found to be statistically significant (p < 0.01). ChatGPT-4 demonstrated superior performance in generating differential diagnoses compared to Copilot (p < 0.05). No statistically significant differences were found between the AI models regarding suggestions for diagnostics and primary diagnosis. Overall, the recommendations of LLMs were rated as average by pediatric surgeons.

Conclusion

This study reveals significant limitations in the performance of AI models in pediatric surgery. Although LLMs exhibit potential across various areas, their reliability and accuracy in handling clinical decision-making tasks is limited. Further research is needed to improve AI capabilities and establish its usefulness in the clinical setting.

Supplementary Material



Publication History

Received: 11 February 2025

Accepted: 04 March 2025

Accepted Manuscript online:
05 March 2025

Article published online:
02 April 2025

© 2025. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany