Comparison of Prostate MRI Lesion Segmentation Agreement Between Multiple Radiologists and a Fully Automatic Deep Learning System

Patrick Schelb; Anoshirwan Andrej Tavakoli; Teeravut Tubtawee; Thomas Hielscher; Jan-Philipp Radtke; Magdalena Görtz; Viktoria Schütz; Tristan Anselm Kuder; Lars Schimmöller; Albrecht Stenzinger; Markus Hohenfellner; Heinz-Peter Schlemmer; David Bonekamp

doi:10.1055/a-1290-8070

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00000066.xml

PDF herunterladen

Rofo 2021; 193(05): 559-573
DOI: 10.1055/a-1290-8070

Urogenital Tract

Comparison of Prostate MRI Lesion Segmentation Agreement Between Multiple Radiologists and a Fully Automatic Deep Learning System

Vergleich der Kongruenz von Prostata-MRT-Läsionssegmentationen durch mehrere Radiologen und ein vollautomatisches Deep-Learning-System

Autoren

Patrick Schelb

¹Division of Radiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
Anoshirwan Andrej Tavakoli

¹Division of Radiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
Teeravut Tubtawee

¹Division of Radiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
Thomas Hielscher

²Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
Jan-Philipp Radtke

³Department of Urology, University of Heidelberg Medical Center, Heidelberg, Germany
Magdalena Görtz

³Department of Urology, University of Heidelberg Medical Center, Heidelberg, Germany
Viktoria Schütz

³Department of Urology, University of Heidelberg Medical Center, Heidelberg, Germany
Tristan Anselm Kuder

⁴Division of Medical Physics, German Cancer Research Center (DKFZ), Heidelberg, Germany
Lars Schimmöller

⁵University Dusseldorf, Medical Faculty, Department of Diagnostic and Interventional Radiology, Dusseldorf, Germany
Albrecht Stenzinger

⁶Institute of Pathology, University of Heidelberg Medical Center, Heidelberg, Germany
Markus Hohenfellner

³Department of Urology, University of Heidelberg Medical Center, Heidelberg, Germany
Heinz-Peter Schlemmer

¹Division of Radiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
David Bonekamp

¹Division of Radiology, German Cancer Research Center (DKFZ), Heidelberg, Germany

Weitere Informationen

Auch verfügbar auf

Lizenzen und Reprints

Abstract

Purpose A recently developed deep learning model (U-Net) approximated the clinical performance of radiologists in the prediction of clinically significant prostate cancer (sPC) from prostate MRI. Here, we compare the agreement between lesion segmentations by U-Net with manual lesion segmentations performed by different radiologists.

Materials and Methods 165 patients with suspicion for sPC underwent targeted and systematic fusion biopsy following 3 Tesla multiparametric MRI (mpMRI). Five sets of segmentations were generated retrospectively: segmentations of clinical lesions, independent segmentations by three radiologists, and fully automated bi-parametric U-Net segmentations. Per-lesion agreement was calculated for each rater by averaging Dice coefficients with all overlapping lesions from other raters. Agreement was compared using descriptive statistics and linear mixed models.

Results The mean Dice coefficient for manual segmentations showed only moderate agreement at 0.48–0.52, reflecting the difficult visual task of determining the outline of otherwise jointly detected lesions. U-net segmentations were significantly smaller than manual segmentations (p < 0.0001) and exhibited a lower mean Dice coefficient of 0.22, which was significantly lower compared to manual segmentations (all p < 0.0001). These differences remained after correction for lesion size and were unaffected between sPC and non-sPC lesions and between peripheral and transition zone lesions.

Conclusion Knowledge of the order of agreement of manual segmentations of different radiologists is important to set the expectation value for artificial intelligence (AI) systems in the task of prostate MRI lesion segmentation. Perfect agreement (Dice coefficient of one) should not be expected for AI. Lower Dice coefficients of U-Net compared to manual segmentations are only partially explained by smaller segmentation sizes and may result from a focus on the lesion core and a small relative lesion center shift. Although it is primarily important that AI detects sPC correctly, the Dice coefficient for overlapping lesions from multiple raters can be used as a secondary measure for segmentation quality in future studies.

Key Points:

Intermediate human Dice coefficients reflect the difficulty of outlining jointly detected lesions.
Lower Dice coefficients of deep learning motivate further research to approximate human perception.
Comparable predictive performance of deep learning appears independent of Dice agreement.
Dice agreement independent of significant cancer presence indicates indistinguishability of some benign imaging findings.
Improving DWI to T2 registration may improve the observed U-Net Dice coefficients.

Citation Format

Schelb P, Tavakoli AA, Tubtawee T et al. Comparison of Prostate MRI Lesion Segmentation Agreement Between Multiple Radiologists and a Fully Automatic Deep Learning System. Fortschr Röntgenstr 2021; 193: 559 – 573

Zusammenfassung

Ziel Ein kürzlich eigens entwickeltes künstliches neuronales Netzwerk (U-Net) zeigte eine gute und mit klinischer radiologischer Befundung vergleichbare Erkennungsrate klinisch signifikanter Prostatakarzinome (sPC). In dieser Arbeit wird nun die Kongruenz der durch U-Net und mehrere Radiologen erstellten Läsionsvolumina (der Segmentationen) verglichen.

Materialien und Methoden 165 Patienten mit Verdacht auf sPC erhielten eine multiparametrische MRT (mpMRT) bei 3 Tesla, gefolgt von gezielter und systematischer MR/TRUS-Fusionsbiopsie. Fünf Segmentationen pro Untersuchung wurden erstellt: Segmentationen klinischer Läsionen, unabhängige und geblindete retrospektive PI-RADS-Befundung durch 3 Radiologen und U-Net. Die läsionsbasierte Übereinstimmung für jeden Befunder wurde durch den Dice-Koeffizienten mit überlappenden Läsionen anderer Befunder bestimmt. Die Übereinstimmung wurde durch deskriptive Statistik und lineare gemischte Modelle verglichen.

Ergebnisse Der mittlere Dice-Koeffizient war für Radiologen mit 0,48–0,52 nur moderat kongruent als Ausdruck der schwierigen visuellen Aufgabe, die Begrenzung sonst übereinstimmend detektierter Läsionen zu bestimmen. U-Net-Segmentationen waren signifikant kleiner als manuelle Segmentationen (p < 0,0001) und zeigten einen geringeren mittleren Dice-Koeffizienten von 0,22, signifikant kleiner als manuelle Segmentationen (alle p < 0,0001). Diese Unterschiede blieben nach Adjustierung für die Segmentationsgröße bestehen und wurden nicht durch das Vorliegen eines sPC oder eine zonale Lokalisation in der peripheren oder Transitionszone beeinflusst.

Schlussfolgerung Die Kenntnis der Größenordnung der Übereinstimmung manueller Segmentationen verschiedener Radiologen ist wichtig, um den Erwartungswert für Künstliche-Intelligenz (KI) -Ansätze festzulegen. Eine perfekte Übereinstimmung (Dice-Koeffizient von 1) sollte für KI nicht erwartet werden. Die geringeren Dice-Koeffizienten des U-Nets werden nur teilweise durch die geringere Segmentationsgröße des U-Nets erklärt, was durch eine Fokussierung des U-Nets auf den Läsionskern und eine geringe Verschiebung des Läsionszentrums erklärt werden könnte. Obwohl primär die korrekte Detektion von sPC durch KI wichtig ist, kann der Dice-Koeffizient mit multiplen Befundern als sekundäres Qualitätsmaß in zukünftigen Studien herangezogen werden.

Kernaussagen:

Intermediäre Dice-Koeffizienten der Radiologen reflektieren die Schwierigkeit der übereinstimmenden Festlegung der Berandung gemeinsam detektierter Läsionen.
Die beobachteten geringeren Dice-Koeffizienten motivieren die Weiterentwicklung von Deep Learning Systemen mit dem Ziel der besseren Approximation menschlicher Perzeption.
Eine vergleichbare Prädiktion des klinisch signifikanten Prostatakarzinoms erscheint unabhängig von der Übereinstimmung der Dice-Koeffizienten.
Die Unabhängigkeit des Dice-Koeffizienten vom Vorliegen eines signifikanten Prostatakarzinoms spricht für die fehlende Unterscheidbarkeit mancher benigner von malignen Bildcharakteristika.
Technische Verbesserungen in der Bildregistrierung zwischen DWI und T2 können in Zukunft möglicherweise die U-Net Dice-Koeffizienten erhöhen.

Key words

MRI - prostate - prostate cancer - deep learning - artificial intelligence - convolutional neural network

Publikationsverlauf

Eingereicht: 15. Juli 2020

Angenommen: 29. September 2020

Artikel online veröffentlicht:
19. November 2020

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Radtke JP, Kuru TH, Boxler S. et al Comparative analysis of transperineal template saturation prostate biopsy versus magnetic resonance imaging targeted biopsy with magnetic resonance imaging-ultrasound fusion guidance. J Urol 2015; 193: 87-94

Crossref PubMed Suche in Google Scholar
Download RIS citation
2 Siddiqui MM, Rais-Bahrami S, Truong H. et al Magnetic resonance imaging/ultrasound-fusion biopsy significantly upgrades prostate cancer versus systematic 12-core transrectal ultrasound biopsy. Eur Urol 2013; 64: 713-719

Crossref PubMed Suche in Google Scholar
Download RIS citation
3 Ahmed HU, El-Shater Bosaily A, Brown LC. et al Diagnostic accuracy of multi-parametric MRI and TRUS biopsy in prostate cancer (PROMIS): a paired validating confirmatory study. Lancet (London, England) 2017; 389: 815-822

Crossref PubMed Suche in Google Scholar
Download RIS citation
4 Kasivisvanathan V, Rannikko AS, Borghi M. et al MRI-Targeted or Standard Biopsy for Prostate-Cancer Diagnosis. The New England journal of medicine 2018; 378: 1767-1777

Crossref PubMed Suche in Google Scholar
Download RIS citation
5 Bonekamp D, Schelb P, Wiesenfarth M. et al Histopathological to multiparametric MRI spatial mapping of extended systematic sextant and MR/TRUS-fusion-targeted biopsy of the prostate. European radiology 2018;

Crossref PubMed Suche in Google Scholar
Download RIS citation
6 Stabile A, Dell'Oglio P, De Cobelli F. et al Association Between Prostate Imaging Reporting and Data System (PI-RADS) Score for the Index Lesion and Multifocal, Clinically Significant Prostate Cancer. Eur Urol Oncol 2018; 1: 29-36

Crossref PubMed Suche in Google Scholar
Download RIS citation
7 Padhani AR, Weinreb J, Rosenkrantz AB. et al Prostate Imaging-Reporting and Data System Steering Committee: PI-RADS v2 Status Update and Future Directions. Eur Urol 2018; 75: 385-396

Crossref PubMed Suche in Google Scholar
Download RIS citation
8 Weinreb JC, Barentsz JO, Choyke PL. et al PI-RADS Prostate Imaging – Reporting and Data System: 2015, Version 2. Eur Urol 2016; 69: 16-40

Crossref PubMed Suche in Google Scholar
Download RIS citation
9 Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In, Advances in neural information processing systems 2012: 1097-1105

Suche in Google Scholar
10 Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In, International Conference on Medical image computing and computer-assisted intervention Springer; 2015: 234-241

Suche in Google Scholar
11 Schelb P, Kohl S, Radtke JP. et al Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment. Radiology 2019; 293: 607-617

Crossref PubMed Suche in Google Scholar
Download RIS citation
12 Yoo S, Gujrathi I, Haider MA. et al Prostate Cancer Detection using Deep Convolutional Neural Networks. Scientific reports 2019; 9: 19518

Crossref PubMed Suche in Google Scholar
Download RIS citation
13 Sanford T, Harmon SA, Turkbey EB. et al Deep-Learning-Based Artificial Intelligence for PI-RADS Classification to Assist Multiparametric Prostate MRI Interpretation: A Development Study. J Magn Reson Imaging 2020;

Crossref PubMed Suche in Google Scholar
Download RIS citation
14 Bonekamp D, Jacobs MA, El-Khouli R. et al Advancements in MR imaging of the prostate: from diagnosis to interventions. Radiographics 2011; 31: 677-703

Crossref PubMed Suche in Google Scholar
Download RIS citation
15 Campanella G, Hanna MG, Geneslaw L. et al Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med 2019; 25: 1301-1309

Crossref PubMed Suche in Google Scholar
Download RIS citation
16 Wang Z, Liu C, Cheng D. et al Automated Detection of Clinically Significant Prostate Cancer in mp-MRI Images Based on an End-to-End Deep Neural Network. IEEE Trans Med Imaging 2018; 37: 1127-1139

Crossref PubMed Suche in Google Scholar
Download RIS citation
17 Dice LR. Measures of the amount of ecologic association between species. Ecology 1945; 26: 297-302

Crossref Suche in Google Scholar
Download RIS citation
18 Greer MD, Brown AM, Shih JH. et al Accuracy and agreement of PIRADSv2 for prostate cancer mpMRI: A multireader study. J Magn Reson Imaging 2017; 45: 579-585

Crossref PubMed Suche in Google Scholar
Download RIS citation
19 Barentsz JO, Richenberg J, Clements R. et al ESUR prostate MR guidelines 2012. European radiology 2012; 22: 746-757

Crossref PubMed Suche in Google Scholar
Download RIS citation
20 Rothke M, Blondin D, Schlemmer HP. et al [PI-RADS classification: structured reporting for MRI of the prostate]. Rofo 2013; 185: 253-261

Thieme Connect PubMed Suche in Google Scholar
Download RIS citation
21 Radtke JP, Schwab C, Wolf MB. et al Multiparametric Magnetic Resonance Imaging (MRI) and MRI-Transrectal Ultrasound Fusion Biopsy for Index Tumor Detection: Correlation with Radical Prostatectomy Specimen. Eur Urol 2016; 70: 846-853

Crossref PubMed Suche in Google Scholar
Download RIS citation
22 Kuru TH, Wadhwa K, Chang RT. et al Definitions of terms, processes and a minimum dataset for transperineal prostate biopsies: a standardization approach of the Ginsburg Study Group for Enhanced Prostate Diagnostics. BJU Int 2013; 112: 568-577

Crossref PubMed Suche in Google Scholar
Download RIS citation
23 Fritzsche KH, Neher PF, Reicht I. et al MITK diffusion imaging. Methods Inf Med 2012; 51: 441-448

Thieme Connect PubMed Suche in Google Scholar
Download RIS citation
24 Nolden M, Zelzer S, Seitel A. et al The Medical Imaging Interaction Toolkit: challenges and advances: 10 years of open-source development. Int J Comput Assist Radiol Surg 2013; 8: 607-620

Crossref PubMed Suche in Google Scholar
Download RIS citation
25 Kuru TH, Wadhwa K, Chang RTM. et al Definitions of terms, processes and a minimum dataset for transperineal prostate biopsies: a standardization approach of the G insburg S tudy G roup for E nhanced P rostate D iagnostics. BJU international 2013; 112: 568-577

Crossref PubMed Suche in Google Scholar
Download RIS citation
26 Egevad L, Delahunt B, Srigley JR. et al International Society of Urological Pathology (ISUP) grading of prostate cancer – An ISUP consensus on contemporary grading. APMIS 2016; 124: 433-435

Crossref PubMed Suche in Google Scholar
Download RIS citation
27 Team RC. R: A language and environment for statistical computing. Vienna, Austria: 2013

Suche in Google Scholar
28 Bossuyt PM, Reitsma JB, Bruns DE. et al Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Radiology 2003; 226: 24-28

Crossref PubMed Suche in Google Scholar
Download RIS citation
29 Litjens G, Toth R, van de Ven W. et al Evaluation of prostate segmentation algorithms for MRI: the PROMISE12 challenge. Medical image analysis 2014; 18: 359-373

Crossref PubMed Suche in Google Scholar
Download RIS citation
30 Bonekamp D, Kohl S, Wiesenfarth M. et al Radiomic Machine Learning for Characterization of Prostate Lesions with MRI: Comparison to ADC Values. Radiology 2018; 289: 128-137

Crossref PubMed Suche in Google Scholar
Download RIS citation
31 Turkbey B, Rosenkrantz AB, Haider MA. et al Prostate Imaging Reporting and Data System Version 2.1: 2019 Update of Prostate Imaging Reporting and Data System Version 2. Eur Urol 2019; 76: 340-351

Crossref PubMed Suche in Google Scholar
Download RIS citation
32 Gunning D. Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web 2017; 2: 2

Suche in Google Scholar
Download RIS citation

Bücher zum Thema

RSS-Feed abonnieren

Teilen / Bookmarken

Comparison of Prostate MRI Lesion Segmentation Agreement Between Multiple Radiologists and a Fully Automatic Deep Learning System

Autoren

Abstract

Zusammenfassung

Key words

Publikationsverlauf

References