Abstract
Background Although abundant literature is currently available on the use of deep learning for
breast cancer detection in mammography, the quality of such literature is widely variable.
Purpose To evaluate published literature on breast cancer detection in mammography for reproducibility
and to ascertain best practices for model design.
Methods The PubMed and Scopus databases were searched to identify records that described
the use of deep learning to detect lesions or classify images into cancer or noncancer.
A modification of Quality Assessment of Diagnostic Accuracy Studies (mQUADAS-2) tool
was developed for this review and was applied to the included studies. Results of
reported studies (area under curve [AUC] of receiver operator curve [ROC] curve, sensitivity,
specificity) were recorded.
Results A total of 12,123 records were screened, of which 107 fit the inclusion criteria.
Training and test datasets, key idea behind model architecture, and results were recorded
for these studies. Based on mQUADAS-2 assessment, 103 studies had high risk of bias
due to nonrepresentative patient selection. Four studies were of adequate quality,
of which three trained their own model, and one used a commercial network. Ensemble
models were used in two of these. Common strategies used for model training included
patch classifiers, image classification networks (ResNet in 67%), and object detection
networks (RetinaNet in 67%). The highest reported AUC was 0.927 ± 0.008 on a screening
dataset, while it reached 0.945 (0.919–0.968) on an enriched subset. Higher values
of AUC (0.955) and specificity (98.5%) were reached when combined radiologist and
Artificial Intelligence readings were used than either of them alone. None of the
studies provided explainability beyond localization accuracy. None of the studies
have studied interaction between AI and radiologist in a real world setting.
Conclusion While deep learning holds much promise in mammography interpretation, evaluation
in a reproducible clinical setting and explainable networks are the need of the hour.
Keywords
artificial intelligence - breast cancer - deep learning - mammography - neural networks
- systematic review