Rofo 2021; 193(06): 652-657
DOI: 10.1055/a-1293-8953
Review

How does Radiomics actually work? – Review

Article in several languages: English | deutsch
Ulrike Irmgard Attenberger
1   Department of Diagnostic and Interventional Radiology, Medical Faculty, University Hospital Bonn, Germany
,
Georg Langs
2   Department of Biomedical Imaging and Image-guided Therapy, Computational Imaging Research Lab, Medical University of Vienna, Wien, Austria
› Author Affiliations
 

Abstract

Personalized precision medicine requires highly accurate diagnostics. While radiological research has focused on scanner and sequence technologies in recent decades, applications of artificial intelligence are increasingly attracting scientific interest as they could substantially expand the possibility of objective quantification and diagnostic or prognostic use of image information.

In this context, the term “radiomics” describes the extraction of quantitative features from imaging data such as those obtained from computed tomography or magnetic resonance imaging examinations. These features are associated with predictive goals such as diagnosis or prognosis using machine learning models. It is believed that the integrative assessment of the feature patterns thus obtained, in combination with clinical, molecular and genetic data, can enable a more accurate characterization of the pathophysiology of diseases and more precise prediction of therapy response and outcome.

This review describes the classical radiomics approach and discusses the existing very large variability of approaches. Finally, it outlines the research directions in which the interdisciplinary field of radiology and computer science is moving, characterized by increasingly close collaborations and the need for new educational concepts. The aim is to provide a basis for responsible and comprehensible handling of the data and analytical methods used.

Key points:

  • Radiomics is playing an increasingly important role in imaging research.

  • Radiomics has great potential to meet the requirements of precision medicine.

  • Radiomics analysis is still subject to great variability.

  • There is a need for quality-assured application of radiomics in medicine.

Citation Format

  • Attenberger UI, Langs G, . How does Radiomics actually work? – Review. Fortschr Röntgenstr 2021; 193: 652 – 657


#

Introduction

The demands of personalized precision medicine require highly accurate diagnostics. Although in recent decades radiological research has focused on the evaluation of scanner and sequence technologies for more accurate disease diagnosis, scientific interest is now focused on current implementations of artificial intelligence (AI) for optimized diagnostics. The implementation possibilities for AI techniques in radiology are manifold: automated lesion detection and characterization, creation of biobanks, dose optimization, structured reporting and radiomics [2] [3]. For the sake of completeness, it should not be forgotten that AI techniques are also used in the latest generation of scanners to optimize data acquisition itself.

The term “radiomics” describes the extraction of quantitative features from image data such as examinations using computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET) and correlation with clinical, genetic or molecular data using AI methods such as machine learning or deep learning. The concept appears promising: using AI methods, information can be extracted from image data that goes far beyond what the human eye can detect. It is assumed that the assessment of these characteristics and feature patterns obtained from the image data when combined with clinical, molecular and genetic data can enable a more precise characterization of the pathophysiology of diseases as well as a statement on therapy response and probable outcome. Some of the applied techniques have been known for decades, but have been developed substantially in recent years, opening up new approaches to the automated exploitation of image information. Publications on this topic go back to the end of the 1940 s, and models such as neural networks were also intensively researched in the 1980 s [4]. Optimized computing power together with methodological advances and increasing availability of large amounts of data to facilitate the training of models have led to a resumption of this work with impressive results [5], resulting in a more timely and efficient utilization of these techniques – a basis for subsequent potential clinical implementation. The scope of application in imaging diagnostics is diverse and ranges from oncological to cardiac and musculo-skeletal diagnostics.

Radiomics is playing an increasingly important role in imaging research due to its great potential to meet the requirements of precision medicine. Numerous studies provide an overview of the underlying concepts [6] [7]. However, it should be noted that every single step of radiomics analysis is subject to great variability. A responsible, comprehensible handling of the submitted data and applied analysis methods is therefore an indispensable basic requirement. Due to the novel way of dealing with image data, an even closer collaboration with medical imaging computing data scientists will be required in the future, as well a restructuring of radiological training.

Radiomics, which describes a subset of AI implementation possibilities in radiology, follows an explicit scheme according to which image data is processed, segmented and analyzed. This overview article will present and explain this analysis.


#

Radiomics Hands-on

The 6 Phases of Radiomics Analysis

A radiomics analysis can essentially be divided into 6 steps: (i) data acquisition, (ii) definition of a region of interest (ROI), (iii) data (pre) processing, (iv) feature extraction, (v) selection of the features relevant to the problem and (vi) classification ([Fig. 1]) [8].

Zoom Image
Fig. 1 The 6 phases of a radiomics analysis Depending on the intermediate or final results, some or all of the analytical steps may have to be repeated.

Data Acquisition

The way in which the data are acquired has a significant influence on the result of the radiomics analysis; therefore it is desirable to use imaging protocols that are standardized, reproducible and comparable [9]. For example, a study by Waugh et al. showed that a higher time-to-repetition (TR) enables better discrimination of texture features in breast MRI [10]. In their publication, Baessler et al. systematically tested the factors influencing the choice of sequence in MRI on feature robustness [11]. A high-resolution FLAIR sequence provided the highest feature robustness. On the other hand, the T2-weighted sequence with lower resolution acquired in comparison achieved the poorest feature stability. There were also differences in robustness among the various feature groups (matrices). The shape and GLZLM (GrayLevel Zone Length Matrix) groups achieved the highest robustness, while the histogram-based features were the least robust [11]. For this reason, Lambin et al. call for a stratified approach to data selection: detailed disclosure of imaging sequences, robust segmentation e. g. by multiple evaluators, phantom studies and imaging at different time points [9].


#

ROI Definition

Optimal ROI Size and Feature Maps

After data acquisition, the region of interest (ROI) is defined, which describes the area over which further analysis will occur. Most of the work related to radiomics deals with issues in oncology, and the ROI is typically set to identify the location of a lesion and apply the subsequent analysis accordingly. Here, too, there is great variability in the methodology of the ROI definition, which in turn has significant influence on the result. Three different ROI variants can be selected: an ROI that follows the contour of the lesion, one that surrounds the entire lesion at right angles (bounding box), and a partial ROI drawn in the center of a section of the lesion [8]. Although bounding boxes are easier to create, and are often sufficient, precise segmentation of lesions, evaluation of shape, and more accurate analysis of contrast at the lesion edges support a better understanding of the lesion. In addition to ROI shape and placement, the ROI size also has a significant impact on the result. Sikiö et al. demonstrated a correlation between ROI size and feature stability [12]. Using a spatial resolution of 0.5 × 0.7 mm2 and a slice thickness of 4 mm, feature stability was lowest with an 80 × 80 pixel ROI; the most stable features were achieved with an ROI of 180 × 180 pixels [12].


#

Segmentation Methods

Segmentation has two tasks: 1) it can make the analysis more specific by allowing explicit access within or outside a lesion; 2) the shape of the segmented lesion itself is a relevant source of features revealed by segmentation. The segmentation of structures in medical image data is an intensively researched field, and accordingly yields different possibilities. Manual segmentation is complemented by automated segmentation methods using special algorithms such as region-growing, level sets for even structures, or, most recently, successful deep learning methods such as so-called U-nets [13] [14]. To date, manual segmentation using an expert reader has been considered the gold standard [15]. However, inter-reader reliability, the reproducibility of the segmentation performed and time required to manually segment large amounts of data are problematic [16] [17]. To reduce this bias, Lambin et al. recommend multiple segmentation, multi-reader analysis, exclusion of high noise segmentation and the use of data from different breathing cycles [9]. In principle, depending on the available data, segmentation can be performed in both 2D and 3D image data. While 2D analysis allows less differentiation in shape and lesions, it is more independent of often highly variable imaging parameters such as slice thickness.


#
#

Image Processing and Preprocessing

Image preprocessing precedes the actual feature extraction. Depending on the data set, this includes interpolation, signal normalization and gray value reduction.

Interpolation of the image data allows standardization of the spatial resolution in data. Studies have shown that higher resolution allows optimized feature extraction. In a study by Mayerhoefer et al., the best results were obtained with interpolation factors of 2–4 [1]. Comparability of the features obtained in the analysis is relevant for signal normalization. Three main approaches are described in the literature: min/max, the Z score and mean ± 3σ [18]. The “mean ± 3σ” method means that the intensities are normalized within μ± 3σ, where μ describes the mean value of the gray values within the ROI, and σ the standard deviation. Consequently, gray values that are outside the range [μ – 3σ, μ + 3σ] are not considered for the analysis.

The reduction of gray values in the form of so-called “binning” during feature extraction results in an improvement in the signal-to-noise (SNR) ratio. It maps the gray value range occurring in the image as frequency distributions. Gray values used in the literature are 16, 32, 64, 128 and 256. In their study, Chen et al. recommend using 32 gray values [19], whereas Mahmoud-Ghonheim et al. use 128 [20].

Image preprocessing has a significant influence on feature robustness. Using a phantom, Wichtmann et al. systematically investigated the influence of spatial resolution, gray value reduction and signal normalization on feature robustness [21]. They demonstrated that only 4 features, skewness (histogram), volume [ml] (shape), volume [vox] (shape) and run length non-uniformity [RLNU] (Gray Level Run Length Matrix, GLRLM), RLNU (GLRLM), remained robust over the variation of all parameters.

This clearly shows that specific recommendations for image processing are necessary.


#

Feature Extraction

Features that are typically used for radiomics analyses can be divided into 4 primary groups: First Order Statistics, Shape, and Texture Features, as well as Features obtained by wavelet transformation of relevant image sections [16]. The group of Texture Feature matrices include the matrices Gray Level Co-occurence Matrix (GLCM), GLRLM, Gray Level Size Zone Matrix (GLSZM), Gray Level Dependence Matrix (GLDM) and Neighboring Grey Tone Difference Matrix (NGTDM). Multiple features are subsumed under each of these matrices. It should be noted that there is great variation in nomenclature, methodology and software implementation [22]. [Table 1] provides a typical overview of the features of the individual matrices [23]. At the same time, efforts are being made towards the invariance of features with respect to protocols and corresponding standardization efforts [24]. The selection of feature extractors is based on the expectation of which characteristics are relevant for the analysis, and accordingly, extractors are often chosen or constructed that are invariant to, for example, global rotation or very low frequency gray value changes.

Table 1

Overview of the features of the individual matrices [21].

first order statistics features

shape and size based features

textural features

wavelet features

grey-level co-occurence matrix based

energy

compactness

autocorrelation

entropy

maximum 3D Diameter

cluster prominence

kurtosis

spherical disproportion

gray-level run-length matrix based

maximum

sphericity

gray level non uniformity

mean

surface area

run length non uniformity

Baessler et al. impressively demonstrated the diagnostic value of texture features for the diagnosis of myocarditis using MRI. Their study showed that texture features were able to differentiate patients with biopsy-proven myocarditis from a healthy control group in the same way as conventional MRI parameters. However, unlike the texture features, the conventional MRI parameters did not allow differentiation between a healthy control group and patients with negative biopsy but clinical suspicion of myocarditis. There was only a statically significant difference for the texture features, especially RLNU and Gray Level Non-Uniformity [25]. Radiomics allowed a more precise diagnostic differentiation between patients with myocarditis and healthy controls compared to the current standard.


#

Feature Selection

A major problem in radiomics analysis is the risk of overfitting the data, which occurs especially when the number of features exceeds the number of records, thus severely limiting the meaningfulness of the analysis. Overfitting can be avoided by reducing dimensionality, i. e. by selecting features to be used for analysis and prediction. This can be based on two foundations: features that are reproducible, robust, and non-redundant can be selected without knowledge of the target issue and allow feature reduction without bias [8] [16]. Feature selection based on how “informative” i. e. relevant, a feature is in the sense of the issue is an effective strategy, but also carries the risk of overfitting. Methods developed from machine learning such as random forests allow an effective selection of informative features while providing robustness against large amounts of non-informative features [26]. In this case, however, as described below, an evaluation of the ultimately resulting predictive accuracy on an independent validation data set that was neither used to train the model nor to select the features is essential [27].

Test-re-test data sets can be used to assess the stability of features, and only those stable features are then used for further analysis. The concordance correlation coefficient (CCC), the dynamic range (DR) and the correlation coefficient across all samples are suitable for testing robustness and reproducibility. Studies have shown that the number of features can thus be reduced considerably, e. g. from 397 to 39 [16]. Furthermore, intra- and inter-observer variability can be tested using the intraclass correlation coefficient (ICC) and Bland-Altmann plots. In addition to the statistical approaches listed here, machine learning methods such as random forests can also be used to identify relevant features for resolving the issue, e. g. the differentiation of benign/malignant.


#

Classification/Modeling

In addition to the statistical approaches listed here, supervised learning approaches are currently most widespread, i. e. a machine is instructed using training data sets with knowledge of the input vector (features) and the output value (target). After this training the thus developed algorithm is applied to a test data set. At this point the extracted characteristics are used for prediction, whereby a key property of relevant methods such as support vector machines or random forests is that they not only evaluate the relationship between isolated features and the prediction target, but can exploit feature groups as multivariate patterns. At this juncture, very rapid progress is also underway, which, enabled by deep learning techniques, increasingly combines the construction of features, their selection and prediction into common models.


#
#

Validation

The final step is corroboration using a validation data set. The predictive performance of the algorithm is tested using ROC/AUC (receiver operating characteristic/area under the curve) analysis [28]. The separation between data used for the training or development of the prediction models and selection of features and those used as validation data is essential. This is necessary to ensure an overly optimistic assessment of the forecast accuracy. As a middle course, cross-validation can be used, in which the training and test data set are iteratively separated. It must be taken into account that the respective test data could be the basis for modeling decisions and therefore do not allow a completely independent assessment – a separate validation data set is required for this.

Parmar et al. have tested the stability and predictive performance of different feature selection and classifier methods [28]. Their results showed that among the different feature selection methods, the Wilcoxon test-based method (WLCX) and mutual information maximization (MIM) achieved the highest stability. Among the classifiers, Bayesian achieved the best performance with an AUC value of 0.64 (SD± 0.05).

Due to the great variability of radiomics analysis, standardization of data collection, evaluation criteria and reporting is necessary. To this end Lambin et al. have defined a “Radiomics Quality Score” (RQS) [9], which describes a standardized analytical process starting with data selection, through imaging, feature extraction, analysis and modeling, as well as report generation. Each of these steps is divided into several sub-steps for which there are scoring points. The maximum achievable score (total RQS) is 36. The definition and introduction of an RQS is an essential step towards a quality-assured application of radiomics in medicine, which aims to counter the variability problem of analysis – which already begins with the primary image data acquisition – by a dedicated reporting of the individual steps. The introduction of an RQS score seems particularly relevant in view of the expected future connection of clinical decision support systems with radiomic data [9].


#
#

Where does this lead?

In addition to standard radiomics approaches that use predefined features, recent development in the field of deep learning, the possibility to combine feature design and predictive model training and to implement them with effective model architectures, plays an increasingly important role in the use of complex image data [7] [29]. On the one hand, this enables the use of image information that is not covered by traditional features. On the other hand, there is the problem of interpreting deep learning models, the solution of which is increasingly the focus of research [30].


#

Summary

Radiomics is playing an increasingly important role in medical imaging due to its great potential to meet the requirements of precision medicine. However, it should be noted that every single step of radiomics analysis is subject to great variability. A responsible, comprehensible handling of the submitted data is therefore an indispensable basic requirement. In the future, radiomics will require an even closer collaboration with medical imaging computing data scientists, as well a restructuring of radiological training.


#
#

Conflict of Interest

The authors declare that they have no conflict of interest.


Correspondence

Prof. Ulrike Irmgard Attenberger
Department of Diagnostic and Interventional Radiology, University Hospital Bonn
Venusberg-Campus 1
53127 Bonn
Germany   
Phone: +49/2 28/28 71 58 71   

Publication History

Received: 02 March 2020

Accepted: 05 October 2020

Article published online:
02 December 2020

© 2020. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom Image
Fig. 1 The 6 phases of a radiomics analysis Depending on the intermediate or final results, some or all of the analytical steps may have to be repeated.
Zoom Image
Abb. 1 Die 6 Phasen einer Radiomicsanalyse. Je nach Teil- oder Gesamtergebnis müssen Teilschritte oder auch die gesamte Analyse wiederholt werden.