Deep Learning Consensus-Based Framework for the Annotation of a Routine Clinical Vestibular Schwannoma MRI Dataset

Navodini Wijethilake; Marina Ivory; Oscar MacCormac; Siddhant Kumar; Steve Connor; Soumya Singdha Kundu; Theodore Barfoot; Aaron Kujawa; Tom Vercauteren; Jonathan Shapey

doi:10.1055/s-0045-1803304

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00000181.xml

J Neurol Surg B Skull Base 2025; 86(S 01): S1-S576
DOI: 10.1055/s-0045-1803304

Presentation Abstracts

Podium Presentations

Oral Presentations

Deep Learning Consensus-Based Framework for the Annotation of a Routine Clinical Vestibular Schwannoma MRI Dataset

Autoren

Navodini Wijethilake

¹King's College London, United Kingdom
Marina Ivory

¹King's College London, United Kingdom
Oscar MacCormac

¹King's College London, United Kingdom
Siddhant Kumar

²The Walton Centre NHS Foundation Trust, Liverpool, United Kingdom
Steve Connor

³King's College Hospital, London, United Kingdom
Soumya Singdha Kundu

¹King's College London, United Kingdom
Theodore Barfoot

¹King's College London, United Kingdom
Aaron Kujawa

¹King's College London, United Kingdom
Tom Vercauteren

¹King's College London, United Kingdom
Jonathan Shapey

¹King's College London, United Kingdom

Weitere Informationen

Auch verfügbar auf

Introduction: Data annotation is critical for developing machine learning models in medical imaging, where annotation accuracy directly affects model performance. However, obtaining high-quality annotations is costly and requires clinical expertise. Delineating vestibular schwannoma (VS) in magnetic resonance imaging (MRI) is particularly challenging due to tumor size variability, patient anatomy, and the heterogeneity of retrospective data, especially when VS coexists with other pathologies like meningioma. Accurate labeling is essential to avoid confounding factors that could hinder model performance.

Methodology: Previously, we used a labor-intensive and costly iterative pipeline to manually annotate heterogeneous scans from multiple institutions, referred to as the multi-center routine clinical (MC-RC) VS dataset (UCLH-MC-RC). In this study, using the UCLH-MC-RC and two additional single-center gamma knife (SC-GK) datasets (LDN-SC-GK, ETZ-SC-GK), we annotated a new MC-RC dataset (KCH-MC-RC). To achieve this, we introduced an iterative pipeline with deep learning-based segmentation to reduce both the annotators' workload and inter-rater variability ([Fig. 1]).

Fig. 1: Iterative deep learning consensus-based framework.

We utilized the default 3D full-resolution UNet from the nnU-Net model for segmentation. The initial training dataset, comprising expert-annotated images from three datasets (UCLH-MC-RC, LDN-SC-GK, and ETZ-SC-GK) were used to train the model ([Table 1]). With each round, the model was bootstrapped by incorporating additional cases from the KCH-MC-RC dataset.

Table 1 Distribution of data between training, validation, and testing sets used in each round

In Round 1 of model training, 427 scans were processed and quality assessed by 3 independent experts as shown in [Fig. 2]. A consensus meeting involving a consultant neurosurgeon (J.S.) was subsequently convened to review complex scans.

After the consensus meeting, accepted KCH-MC-RC cases were combined with the initial training data to enhance the segmentation model through bootstrapping ([Table 1]). Rejected sessions were then reprocessed using this bootstrapped model. An expert-trained radiologist manually assessed Round 2 annotations; these were accepted or corrected using the ITK-SNAP annotation tool.

In Round 3, accepted and corrected cases from Round 2 were added to the previously accepted cases from Round 1 and combined with the initial training dataset to further refine the model through bootstrapping.

Two independent unseen test datasets were used to evaluate model performance of the bootstrapped models: (1) 50 cases drawn from the UCLH-MC-RC, ETZ-SC-GK, LDN-SC-GK datasets and (2) 30 cases drawn from the KCH-MC-RC dataset.

Results: Using the bootstrapped models did not improve segmentation results but performance on the KCH-MC-RC validation set improved with each round ([Fig. 3]).

Fig. 3 Dice score performance of the model on each round on the test set and the KCH-MC-RC validation set.

Conclusion: This work demonstrated that iterative bootstrapping was effective in refining the model for the specific characteristics of the KCH-MC-RC dataset. This approach could improve a deep learning segmentation model’s accuracy and adaptability when dealing with complex, heterogeneous medical data.

Publikationsverlauf

Artikel online veröffentlicht:
07. Februar 2025

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

Weitere Informationen

Ähnliche Zeitschriften

Bücher zum Thema

RSS-Feed abonnieren

Teilen / Bookmarken

Deep Learning Consensus-Based Framework for the Annotation of a Routine Clinical Vestibular Schwannoma MRI Dataset

Autoren

Publikationsverlauf