Subscribe to RSS
DOI: 10.1055/s-0045-1805427
GastroNet-5M: A Multicenter Dataset for Foundation Model Development in Gastrointestinal Endoscopy
Aims Developing deep learning systems for medical imaging typically demands extensive datasets. Yet largescale collection of data and corresponding expert annotations remains challenging and costly. Foundation models, which are large pretrained models that capture broad, transferable knowledge from domain-specific data (e.g. endoscopic imagery), have shown promise in addressing these data limitations. However, the field of endoscopy still lacks accessible datasets suitable for training such models. In this study, we describe new experiments with GastroNet-5M, a comprehensive dataset of 5,002,545 endoscopic images, and further explore its potential to support foundation model development in endoscopy.
Methods GastroNet-5M is composed of anonymized endoscopic images collected from eight Dutch hospitals between 2012 and 2020. Using a self-supervised learning approach, this dataset enabled the development of a foundation model for various endoscopic AI applications. In this study, we compared GastroNet-5M pretrained models against the current standard of ImageNet-pretrained models. First, the diagnostic performance of GastroNet-5M-pretrained models was compared to that of ImageNet-pretrained models across 11 endoscopic classification and segmentation tasks, such as Barrett’s neoplasia detection, colorectal polyp characterization, and gastric cancer invasion depth prediction. Following this, data efficiency was assessed by repeating these experiments with stepwise reductions in training set size to examine model performance with less data. Finally, robustness was evaluated by testing model performance against data heterogeneity (e.g. training and testing on different endoscope manufacturers) on 4 additional test datasets.
Results Models pretrained with GastroNet-5M demonstrated a significant performance increase, surpassing all ImageNet benchmark models across all endoscopic downstream tasks (p>0.001). On average, GastroNet-5M models displayed a 3.5% higher AUC score for classification tasks and a 11.5% higher Dice score for segmentation tasks compared to the ImageNet models. In addition, GastroNet-5M models required significantly less downstream training data for 10 out of 11 downstream tasks (p=0.10). Finally, GastroNet-5M models displayed higher classification scores across all 4 robustness test datasets.
Conclusions GastroNet-5M, a multicenter dataset of over 5 million unlabeled endoscopic images, offers a valuable resource for pretraining deep learning models in endoscopy. The use of GastroNet-5M enhances model accuracy, reduces required dataset size, and improves robustness of endoscopic AI systems. GastroNet-5M will be made publicly accessible for further research and development.
Publication History
Article published online:
27 March 2025
© 2025. European Society of Gastrointestinal Endoscopy. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany