Eur J Pediatr Surg
DOI: 10.1055/a-2646-2052
Original Article

Artificial Intelligence Enhances Diagnostic Accuracy of Contrast Enemas in Hirschsprung Disease Compared to Clinical Experts

1   Department of Pediatric Surgery, Miguel Servet University Hospital, Zaragoza, Spain
,
Matej Varga
2   Department of Experimental Physics, Slovak Academy of Sciences, Kosice, Slovakia
,
Beatriz Izquierdo-Hernández
3   Department of Radiology, Miguel Servet University Hospital, Zaragoza, Spain
,
Cristina Gutierrez-Alonso
3   Department of Radiology, Miguel Servet University Hospital, Zaragoza, Spain
,
Ainara Gonazlez-Esgueda
1   Department of Pediatric Surgery, Miguel Servet University Hospital, Zaragoza, Spain
,
Maria Victoria Cobos-Hernández
3   Department of Radiology, Miguel Servet University Hospital, Zaragoza, Spain
,
1   Department of Pediatric Surgery, Miguel Servet University Hospital, Zaragoza, Spain
,
Yurema Gonzalez-Ruiz
1   Department of Pediatric Surgery, Miguel Servet University Hospital, Zaragoza, Spain
,
Paolo Bragagnini-Rodriguez
1   Department of Pediatric Surgery, Miguel Servet University Hospital, Zaragoza, Spain
,
María del-Peral-Samaniego
1   Department of Pediatric Surgery, Miguel Servet University Hospital, Zaragoza, Spain
,
Carolina Corona-Bellostas
1   Department of Pediatric Surgery, Miguel Servet University Hospital, Zaragoza, Spain
› Author Affiliations


Preview

Abstract

Introduction

Contrast enema (CE) is widely used in the evaluation of suspected Hirschsprung disease (HD). Deep learning is a promising tool to standardize image assessment and support clinical decision-making. This study assesses the diagnostic performance of a deep neural network (DNN), with and without clinical data, and compares its interpretation with that of pediatric surgeons and radiologists.

Materials and Methods

In this retrospective study, 1,471 CE images from patients <15 years were analyzed, with 218 images used for testing. A DNN, pediatric radiologists, and surgeons independently reviewed the testing set, with and without clinical data. Diagnostic performance was assessed using ROC and PR curves, and interobserver agreement was evaluated using Fleiss' kappa. Rectal biopsy served as the reference standard.

Results

The DNN achieved high diagnostic accuracy (area under the receiver operating characteristic curve [AUC-ROC] = 0.87) in CE interpretation, with improved performance when combining anteroposterior and lateral images (AUC-ROC = 0.92). Clinical data integration further enhanced model sensitivity and negative predictive value. The super-surgeon (majority voting of colorectal surgeons) outperformed most individual clinicians (sensitivity 81.8%, specificity 79.1%), while the super-radiologist (majority voting of radiologists) showed moderate accuracy. Interobserver analysis revealed strong agreement between the model and surgeons (Cohen's kappa = 0.73), and overall consistency among experts and the model (Fleiss' kappa = 0.62).

Conclusions

Artificial intelligence-assisted CE interpretation achieved higher specificity and comparable sensitivity to that of the clinicians. Its consistent performance and substantial agreement with experts support its potential role in improving CE assessment in HD.



Publication History

Received: 12 April 2025

Accepted: 29 June 2025

Accepted Manuscript online:
01 July 2025

Article published online:
15 July 2025

© 2025. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany