Keywords
clear cell renal cell carcinoma - RCC subtyping - machine learning
Introduction
Clear cell renal cell carcinoma (ccRCC) is the most common histopathologic type of
RCC.[1] The histologic type of RCC acts as an independent predictor of distant metastasis
and cancer-related death, with ccRCC having a worse prognosis than both chromophobe
and papillary RCC.[2]
[3] Development of various immunomodulators and targeted therapeutic agents also makes
it mandatory to differentiate between subtypes of RCC as different histologic types
have varying responses to various agents.[4]
[5] While image-guided biopsy or surgical excision can provide the exact histopathology
in most cases, they are invasive in nature. Imaging techniques like multiphasic computed
tomography (CT) and magnetic resonance imaging (MRI) suffer from high subjectivity
and overlapping findings among different RCC subtypes and grades.[6]
[7]
[8]
[9]
Quantitative imaging parameters using dual-energy CT (DECT) and perfusion CT have
been previously investigated to distinguish various renal masses, differentiate enhancing
masses from cysts, and predict the histology and grade of RCCs.[10]
[11]
[12]
[13]
[14] In the era of precision medicine, machine learning (ML) models can be trained to
differentiate benign and malignant tumors, predict histologic type and nuclear grading,
genetic or molecular signatures, and predict prognosis, using the plethora of quantitative
markers available on imaging. There are few studies on ML using DECT in renal masses,
and most of them focus on the grading and prognostication of ccRCC. Few studies have
evaluated the use of ML to differentiate ccRCC from non-ccRCC.[15]
[16]
[17]
[18]
In this study, our aim was to evaluate the performance of DECT-based imaging markers
to predict the histologic subtype of RCC as well as grade of ccRCC and to assess whether
the accuracy could be improved using ML algorithms.
Materials and Methods
Dataset Retrieval
Ethical approval was waived due to the retrospective and observational study design.
We searched our hospital's pathology report database, from January 2017 to November
2022, for histopathological nephrectomy specimens with proven RCC. These histopathological
reports served as reference standards to classify RCCs into ccRCCs and non-ccRCCs
and to further grade ccRCCs using the Fuhrman classification system. Dual-energy abdominal
CT images of these patients were retrieved from our hospital's picture archiving and
communication system (PACS) using their hospital identification numbers. Patients
without a baseline preoperative scan in the PACS were excluded. Ultimately, 112 RCC
patients who had a preoperative CT abdomen performed at our hospital and subsequently
underwent surgery were included in the analysis. Of these, 25 were non-ccRCCs (comprising
10 papillary and 8 chromophobe RCCs), and 87 were ccRCCs. Among the 87 ccRCCs, 61
were classified as low grade (Fuhrman grades I–II), 18 as high grade (Fuhrman grades
III–IV), and 8 cases were ungraded in the pathology reports and thus excluded from
the Fuhrman grade correlation analysis. [Fig. 1] summarizes the inclusion and exclusion criteria and the final number of analyzed
CT scans in different phases.
Fig. 1 Flowchart showing the study design. ccRCC, clear cell renal cell carcinoma; DECT,
dual-energy computed tomography.
Image Acquisition
DECT was performed as per the routine institutional practice on a dual-source dual-energy
256-slice CT machine after intravenous administration of 1.8 to 2 mL/kg of nonionic
iodinated contrast at a rate of 3.8 mL/s using a pressure injector. The DECT scan
acquisition parameters are detailed in [Table 1]. Three postcontrast phases were acquired: corticomedullary phase at 35 seconds,
nephrographic phase at 90 seconds, and a delayed phase at 15 minutes. Virtual noncontrast
(VNC) images were generated from the dual-energy images, and noncontrast images were
not acquired separately. The CT scans retrieved for the study were anonymized by removing
their DICOM metadata and assigning each scan a new, unique study identification number.
Table 1
Dual-energy CT (DECT) acquisition parameters
|
Method of DECT
|
Dual source
|
|
Detector collimation (mm)
|
0.6
|
|
Tube voltage (kV)
Tube A
Tube B
|
140
100
|
|
Tube current time product (mA)
Tube A
Tube B
|
230
178
|
|
Gantry rotation time (s)
|
0.5
|
|
Pitch
|
0.6
|
|
Acquisition mode
|
Helical
|
|
Reconstructed section thickness (mm)
|
1.5
|
|
Matrix size
|
256 × 256
|
|
Reconstruction kernel
|
D30f
|
|
Reconstruction algorithm
|
Filtered back projection
|
Image Analysis
The DECT images were processed using a dedicated dual-energy software package (syngo.via
VB10A, Siemens Healthineers). The software generated color-coded iodine maps by employing
the iodine subtraction algorithm (Liver VNC, Siemens Healthineers). Two independent
readers (with 10 and 7 years of experience in body imaging), who were blinded to the
final histopathology results, performed the analysis. The readers drew circular regions
of interest (ROIs) on the largest axial section of the tumor. In homogeneous lesions,
the ROIs encompassed as much tumor area as possible. For heterogeneous lesions, the
ROIs were carefully placed to include only the most avidly enhancing areas while excluding
necrotic regions. Representative images showing the ROI placement are shown in [Fig. 2]. The iodine concentration (IC) within the tumor, measured in milligrams per milliliter,
and the iodine ratio (IR), calculated as the IC in the tumor divided by the IC in
the aorta at the level of the renal artery supplying the kidney with the tumor, were
recorded for each ROI.
Fig. 2 Representative dual-energy CT image with iodine overlay map showing placement of
the circular region of interest (ROI) over the most enhancing part of the tumor in
(a) clear cell renal cell carcinoma (RCC) with (b) corresponding corticomedullary and (c) nephrographic phase images showing the hyperenhancing nature of the lesion. Similar
dual-energy CT image with (d) iodine overlay map, (e) corticomedullary, and (f) nephrographic phase images in a case of non–clear cell RCC.
Machine Learning Models
Inter-reader agreement for the calculated parameters, IC and IR, was calculated using
intraclass coefficient (ICC). Six ML models, including Logistic Regression (LR), Support
Vector Machine (SVM), Random Forest Classifier (RFC), AdaBoost, Naive Bayes (NB),
and Artificial Neural Network (ANN), were trained to perform binary classification
between ccRCC and non-ccRCC, and to predict the grade of ccRCC as low or high, using
the two DECT parameters, IC and IR. We further attempted to ensemble the outputs of
each of these models using another set of ML architectures to see if the accuracy
was improved when multiple ML techniques were applied together.
Each of the ML architectures used in the study is briefly described below.
-
SVM: It is a method that finds the best way to separate different groups of data points
by drawing a line (or in more complex scenarios, a plane, or a hyperplane) between
them. This line is drawn to maximize the distance from the nearest points of any group,
ensuring the clearest distinction. This model is robust to outliers and quickly converges
to its final model.
-
RFC: It uses many decision trees to make predictions. Each tree votes on the outcome,
and the majority vote is taken as the final prediction. This approach reduces errors
and improves accuracy by combining the strengths of multiple trees.
-
AdaBoost: It is a boosting technique that combines several weak models (in our case, Haar cascade
classifiers) to create a strong predictive model. It works by giving more weight to
the mistakes of the previous models, focusing on the harder-to-predict instances to
improve overall performance.
-
NB: It is a probabilistic model based on Bayes' theorem. It assumes that features are
independent of each other, which simplifies the calculation of the probability of
different outcomes. Despite this simple assumption, it often performs surprisingly
well for many types of problems.
-
LR: It is a statistical method used to predict the probability of a binary outcome. It
models the relationship between the input features and the probability of the outcome
using a logistic function, making it suitable for classification tasks.
-
ANN: It is a model inspired by the human brain. It consists of layers of interconnected
nodes (neurons) that process data by passing it through multiple layers, learning
complex patterns and relationships in the data to make accurate predictions.
Each of the features (IC and IR) were normalized between 0 and 1 using min–max normalization
before passing them through the ML models. This step is crucial as it puts all parameters
on an even scale, allowing each one to contribute equally to the model, regardless
of their original scale or units. We did not add any priors to any of the networks.
The training process was conducted using a standard ML library called sklearn, which
provides tools for building and deploying ML models. The outputs of the ML models
were thresholded at a given threshold (0.5) to assign a predicted class to each of
the samples. The performance of the ML models was assessed in terms of accuracy, precision,
recall, and F1 score. Area under the receiver operator characteristic (ROC) curve
and Pearson's correlation coefficient were also calculated.
Results
Patient demographics and tumor characteristics: A total of 112 patients with RCCs were included in the study (mean age: 65 years;
male:female: 61:51). Of these, 87 were pathologically proven ccRCCs, while 25 were
non-ccRCCs. Non-ccRCCs included papillary RCC, chromophobe RCC, and collecting duct
RCC ([Table 2]).
Table 2
Patient characteristics
|
Total number of patients
|
112
|
|
Mean age (y)
|
65
|
|
Gender (male:female)
|
61:51
|
|
Histologic type of RCC (
n
= 112)
Clear cell (ccRCC)
Non–clear cell RCC
Papillary cell
Chromophobe
Other non–clear cell subtypes
|
87
25
10
8
7
|
|
Histologic grade of ccRCC (
n
= 79)
Low
High
|
61
18
|
Grading was available in 79 cases of ccRCC, of which 61 were low-grade tumors and
18 were high-grade tumors.
RCC Subtyping (ccRCC vs. non-ccRCC)
Performance of Individual DECT Parameters
The mean IC and IR values were 2.49 mg/mL and 58.7 for reader 1 and 2.44 mg/mL and
57.8 for reader 2, respectively. The correlation coefficient between the two readers
was 0.89 each for IC and IR ([Table 3]). This is represented graphically in [Fig. 3] where the IC ([Fig. 3a]) and IR ([Fig. 3b]) as measured by reader 1 and reader 2 form the x and y-axis, respectively; all the
data points are seen clustered around a straight line, suggesting high agreement between
the two readers.
Fig. 3 Scatter plots showing inter-reader agreement for (a) iodine concentration and (b) iodine ratio. Values from reader 1 are plotted along the x-axis (IC_1 and IR_1),
while values determined by reader 2 are plotted along the y-axis (IC_2 and IR_2).
Table 3
Dual-energy CT parameters
|
Iodine concentration (mg/mL)
|
Iodine ratio
|
|
Clear cell RCC (n = 87)
|
Non–clear cell RCC (n = 25)
|
Clear cell RCC (n = 87)
|
Non–clear cell RCC (n = 25)
|
|
Mean
|
2.76
|
1.44
|
65.70
|
33.57
|
|
Median
|
2.60
|
1.4
|
60.58
|
32.15
|
|
Interquartile range
|
1.22
|
1.35
|
31.0
|
32.55
|
|
Standard deviation
|
1.17
|
0.82
|
25.05
|
21.94
|
|
Inter-reader agreement
|
0.894 (p < 0.0001)
|
0.889 (p < 0.0001)
|
Abbreviations: CT, computed tomography; RCC, renal cell carcinoma.
Note: The values of iodine concentration and iodine ratio were higher for ccRCC than
non-ccRCC. Inter-reader agreement was very high for both parameters and the results
were statistically significant.
For distinguishing ccRCC and non-ccRCCs, IC alone had an accuracy of 77.7% at a threshold
of 6.8 mg/mL, while IR alone had an accuracy of 77.5% at a threshold of 155.65, with
higher values indicating ccRCC in both parameters.
Performance of ML Models
Among the individual ML models, RFC and AdaBoost had a similar performance with an
accuracy of 89.2% each, F1 score of 0.886 and 0.881, precision of 0.896 and 0.889,
recall of 0.892 each, and area under the curve (AUC) of 0.888 and 0.970. The performance
of the other ML models was good with an accuracy of 85.6, 82.8, 89.2, 83.8, and 79.3%
for LR, SVM, NB, and ANN, respectively ([Table 4]). Combining the ML models improved the performance of all ML models. AdaBoost performed
the best with an accuracy of 100%. The performance of other ML models also improved
with an accuracy of 87.4, 89.2, 91.0, 87.4, and 86.5% for LR, SVM, RFC, NB, and ANN,
respectively ([Table 5]). The results for histologic subtyping of RCC are depicted graphically using box
and whisker plots in [Fig. 4], showing the accuracy of IC ([Fig. 4a]) and IR ([Fig. 4b]) individually without the use of ML, as well as the highest performing individual
algorithm—the RFC ([Fig. 4c])—and the best performing combined ML algorithm—AdaBoost ([Fig. 4d]). Wide separation of the boxes of ccRCC and non-ccRCC, as seen in [Fig. 4d], indicates higher accuracy.
Fig. 4 Box and whisker plots for renal cell carcinoma (RCC) subtyping using (a) iodine concentration (IC), (b) iodine ratio (IR), (c) Random Forest classifier alone (RF), and (d) AdaBoost boosted by other machine learning models.
Table 4
Accuracy of DECT biomarkers and individual machine learning models for differentiating
clear cell and non–clear cell RCC
|
Iodine concentration
|
Iodine ratio
|
Logistic regression
|
Support Vector Machine
|
Random Forest Classifier
|
AdaBoost
|
Naive Bayes
|
Artificial Neural Network
|
|
Accuracy
|
0.777
|
0.775
|
0.856
|
0.828
|
0.892
|
0.892
|
0.838
|
0.793
|
|
F1 score
|
0.679
|
0.676
|
0.842
|
0.814
|
0.881
|
0.886
|
0.838
|
0.717
|
|
Precision
|
0.603
|
0.600
|
0.850
|
0.815
|
0.896
|
0.889
|
0.838
|
0.836
|
|
Recall
|
0.777
|
0.775
|
0.856
|
0.829
|
0.892
|
0.892
|
0.838
|
0.793
|
|
Threshold
|
6.800
|
155.65
|
0.5
|
0.5
|
0.5
|
0.5
|
0.5
|
0.5
|
|
Area under ROC
|
0.825
|
0.832
|
0.846
|
0.823
|
0.888
|
0.970
|
0.848
|
0.820
|
Abbreviations: DECT, dual-energy computed tomography; RCC, renal cell carcinoma; ROC,
receiver operating characteristic curve.
Note: The accuracy of individual DECT parameters for prediction of the RCC subtype
was moderate (IC and IR in the first two columns). All the machine learning models
performed better than the individual DECT parameters. The Random Forest classifier
and AdaBoost models had the highest accuracy and area under the curve.
Table 5
Accuracy of combined machine learning models for differentiating clear cell and non–clear
cell RCC
|
Logistic regression
|
Support Vector Machine
|
Random Forest classifier
|
AdaBoost
|
Naive Bayes
|
Artificial Neural Network
|
|
Accuracy
|
0.874
|
0.892
|
0.910
|
1.0
|
0.874
|
0.865
|
|
F1 score
|
0.858
|
0.881
|
0.910
|
1.0
|
0.872
|
0.853
|
|
Precision
|
0.880
|
0.896
|
0.910
|
1.0
|
0.871
|
0.860
|
|
Recall
|
0.874
|
0.892
|
0.910
|
1.0
|
0.874
|
0.865
|
|
Threshold
|
0.5
|
0.5
|
0.5
|
0.5
|
0.5
|
0.5
|
|
Area under ROC
|
0.866
|
0.885
|
0.972
|
1.0
|
0.911
|
0.870
|
Abbreviations: DECT, dual-energy computed tomography; RCC, renal cell carcinoma; ROC,
receiver operating characteristic curve.
Note: Combining multiple machine learning algorithms to create another level of architecture
improved the performance of all the models, with AdaBoost performing the best.
Fuhrman Grade Prediction for ccRCCs (Low vs. High Grade)
Performance of Individual DECT Parameters
To differentiate between high- and low-grade ccRCCs, IC alone had an accuracy of 77.9%,
while IR alone had an accuracy of 77.6%. The discriminatory threshold values for IC
and IR were 6.80 mg/mL and 128.55, respectively ([Table 6]).
Table 6
Accuracy of DECT biomarkers and individual machine learning models for differentiating
high- and low-grade clear cell RCC
|
|
Iodine concentration
|
Iodine ratio
|
Logistic regression
|
Support Vector Machine
|
Random Forest
|
AdaBoost
|
Naïve Bayes
|
Artificial Neural Network
|
|
Accuracy
|
0.779
|
0.776
|
0.776
|
0.776
|
0.776
|
0.789
|
0.776
|
0.776
|
|
F1 score
|
0.682
|
0.678
|
0.678
|
0.678
|
0.678
|
0.708
|
0.678
|
0.678
|
|
Precision
|
0.607
|
0.603
|
0.603
|
0.603
|
0.603
|
0.834
|
0.603
|
0.603
|
|
Recall
|
0.779
|
0.776
|
0.776
|
0.776
|
0.776
|
0.790
|
0.776
|
0.776
|
|
Threshold
|
6.80
|
128.55
|
0.5
|
0.5
|
0.5
|
0.5
|
0.5
|
0.5
|
|
Area under ROC
|
0.524
|
0.463
|
0.544
|
0.994
|
0.922
|
0.898
|
0.515
|
0.538
|
Abbreviations: DECT, dual-energy computed tomography; RCC, renal cell carcinoma; ROC,
receiver operating characteristic curve.
Note: For grade prediction, individual DECT parameters and machine learning models
had similar performance with moderate accuracy.
Performance of ML Models
The best performing ML model was AdaBoost with an accuracy of 78.9% and F1 score of
0.708. The ML models including LR, SVM, RFC, NB, and ANN had a similar accuracy of
77.6%. Combining ML techniques improved the accuracy of all the models except LR and
ANN. The best performing model was again AdaBoost with an accuracy of 100%. The accuracies
of SVM, RFC, and NB were 97.4, 98.7, and 94.7%, respectively ([Table 7]). The results for grading of ccRCC are depicted graphically using box and whisker
plots in [Fig. 5], showing the accuracy of IC ([Fig. 5a]) and IR ([Fig. 5b]) individually without the use of ML, as well as the highest performing individual
algorithm—the RFC ([Fig. 5c])—and the best performing combined ML algorithm—AdaBoost ([Fig. 5d]). Wide separation of the boxes of ccRCC and non-ccRCC, as seen in [Fig. 5d], indicates higher accuracy.
Fig. 5 Box and whisker plots for grading of clear cell renal cell carcinoma (ccRCC) using
(a) iodine concentration (IC), (b) iodine ratio (IR), and (c) Random Forest classifier alone (RF) and (d) AdaBoost in combination with other machine learning algorithms.
Table 7
Accuracy of DECT biomarkers and combined machine learning models for differentiating
high-grade and low-grade clear cell RCC
|
Logistic regression
|
Support vector machine
|
Random Forest classifier
|
AdaBoost
|
Naive Bayes
|
Artificial Neural Network
|
|
Accuracy
|
0.776
|
0.974
|
0.987
|
1.0
|
0.947
|
0.776
|
|
F1 score
|
0.678
|
0.974
|
0.987
|
1.0
|
0.948
|
0.678
|
|
Precision
|
0.603
|
0.974
|
0.987
|
1.0
|
0.951
|
0.603
|
|
Recall
|
0.776
|
0.974
|
0.987
|
1.0
|
0.947
|
0.776
|
|
Threshold
|
0.5
|
0.5
|
0.5
|
0.5
|
0.5
|
0.5
|
|
Area under ROC
|
0.975
|
0.986
|
1.0
|
1.0
|
0.988
|
0.973
|
Abbreviations: DECT, dual-energy computed tomography; RCC, renal cell carcinoma; ROC,
receiver operating characteristic curve.
Note: Combining the ML techniques improved the accuracy of the models for grade prediction
of clear cell RCC (ccRCC), with AdaBoost performing the best.
Discussion
In this study, we have assessed the performance of quantitative DECT biomarkers and
various ML models to distinguish the type (ccRCC vs. non-ccRCC) and grade (low vs.
high) of ccRCCs. IC and IR were used as objective DECT parameters to represent the
enhancement characteristics of RCC, which are generally evaluated subjectively in
order to identify ccRCCs and non-ccRCCs. The inter-reader agreement for both DECT
parameters was high, suggesting inter-rater reproducibility of results. The accuracy
of IC alone and IR alone for distinguishing ccRCC and non-ccRCCs was moderate. Among
the ML models, RF and AdaBoost had a very high accuracy and combining ML techniques
improved their performance. AdaBoost had an accuracy of 100%, which meant that it
was able to predict the correct type of RCC and the right grade of ccRCC in all cases.
Previous studies on DECT in RCC have found that IC and IR can be used to predict the
RCC subtype with a high accuracy, which can further be improved with the combination
of perfusion CT and radiomics.[11] A recent study also developed a nomogram for the preoperative classification of
high-grade versus low-grade tumors among ccRCC using clinical parameters such age,
systemic immune-inflammation index, and slope of spectral CT curve in the cortical
phase, using data from 73 ccRCC patients.[19] Most of the published literature on ML for prediction of RCC subtype or grade is
based on the use of radiomics and texture parameters. Several studies have trained
ML models using textural data with a high degree of accuracy.[18]
[20]
[21]
However, few studies have used DECT data to train ML models.
In the present study, we have trained several types of ML models including SVM, RFC,
AdaBoost, NB, LR, and ANN. We have further ensembled the outputs of each of these
models with another set of ML architectures to further improve their accuracy. The
algorithms were trained on imaging-based quantitative markers, including IC and IR,
after normalizing their values using min–max normalization. The performance was then
assessed at a given threshold of 0.5, and we found that there was high accuracy of
prediction of ccRCC versus non-ccRCC, which was further improved when an additional
set of ML architectures was added to the ensemble output. It is well known that ccRCCs
are hyperenhancing as compared to non-ccRCCs, and this was probably reflected in the
quantitative DECT parameters used to train the ML models. We did not perform external
validation due to the small sample size and retrospective design of the study; however,
future prospective studies using larger datasets could be used to validate these results
in a more “real-world” setting.
As seen in our study, the combination of DECT parameters with ML has the potential
to improve the accuracy of predicting the histological subtype and grade of RCCs noninvasively.
A recently published study by Bing et al trained ML models using DECT-based radiomics
to predict the number of interstitial collagen fibers and pseudocapsule thickness
in ccRCCs. They found that the combined model, which used data from iodine-based material
decomposition images and mixed energy images, had the best performance with a specificity
of 0.87 and a sensitivity of 0.75.[22] In our study, we used DECT data, combining IC and IR values to train various ML
models with a high accuracy of predicting RCC subtypes and ccRCC grades. We also ensembled
multiple ML algorithms together to a higher level of architecture to further boost
the performance of the ML models.
Limitations
This study was limited by its retrospective nature and small sample size. Therefore,
prediction of various subtypes of non-ccRCC was not possible. The dataset included
only RCCs, and the algorithm was trained only to differentiate ccRCC from non-ccRCC.
We did not include any renal lesions other than RCCs, such as angiomyolipomas or other
non-RCC malignancies, and thus the algorithm would not be able to differentiate such
lesions from RCCs.
While surgery would still be required irrespective of the histologic subtype of RCC,
the differentiation between ccRCC and non-ccRCC is clinically relevant as it affects
the therapeutic choice, response to treatment, and prognosis. While studies have found
that there is no significant difference in the prognosis of papillary and chromophobe
RCCs, both are significantly better than ccRCC.[2]
[10] Biopsy is often required prior to starting immunotherapy in metastatic disease,
and being able to distinguish ccRCC from non-ccRCC would be useful in such cases.
DECT was performed only on a dual-source machine. This limits its generalizability
as different DECT techniques may yield different values of IC.
There may be the problem of overfitting in the present study resulting in such high
accuracies, as we did not perform external validation. Validation of these results
using a larger, external dataset could further check and correct for this issue inherent
to ML models.
Thus, a larger prospective cohort study including non-RCC renal masses can be conducted
to develop more robust and widely applicable ML models that can predict the different
subtypes of RCCs including the less common non-ccRCCs and differentiate RCCs from
other renal masses.
Conclusion
DECT-based quantitative biomarkers have a moderate diagnostic accuracy to differentiate
ccRCC from non-ccRCC and to predict the grade of the tumor. The use of ML models can
significantly improve the accuracy of the DECT parameters. Thus, DECT parameters combined
with ML can provide noninvasive, accurate, and reliable biomarkers for RCC subtype
prediction, which is crucial for prognostication and choice of targeted therapy.