Keywords
total knee arthroplasty - surgical site infection - machine learning - artificial
intelligence
In the United States alone, over 680,000 total knee arthroplasty (TKA) procedures
are performed each year, with growth projections until 2030 ranging from 84.9 to 147%.[1] Surgical site infection (SSI) is one of the most common health care-associated infections
(HAI) among orthopaedic patients,[2] estimated to occur in 1 to 3.5% of total joint arthroplasties (TJAs).[3] By 2030, the prevalence of TKA patients with SSI will increase due to the significantly
increasing demand for primary TKA procedures; nonetheless, the rate of SSI following
primary TKA is expected to range between 2.2 and 6.8%.[4] The American College of Surgeons National Surgical Quality Improvement Program divides
SSI into superficial and deep SSIs, with deep SSI also commonly being known as periprosthetic
joint infection (PJI). While superficial SSI generally requires a less aggressive
treatment approach, PJI often necessitates invasive management with intravenous antibiotics
and reoperations including TKA component exchanges.[5] PJI is the most common indication of revision in TKA patients,[6] which results in increased length of hospital stay and resource utilization. From
a patient's standpoint, superficial and deep SSIs following TKA have been associated
with significant morbidity and mortality.
Although the understanding of how patients differ in terms of their susceptibility
to SSI prior to TKA remains incomplete, previous studies have identified several risk
factors that can be categorized as modifiable and non-modifiable.[7] Despite the continuous development of infection prevention modalities such as patient
optimization, advances in surgical technique, sterile protocols, and operative procedures,
SSI after TKA still poses a substantial burden to patients, surgeons, and the health
care system. Therefore, recent studies have developed predictive risk calculators
to estimate the probability of SSI preoperatively in an effort to reduce SSI following
TKA; however, their study findings demonstrate limited success.[8] Additionally, these risk calculators provide limited clinical utility due to their
cumbersome nature. With the development of computational science and its availability
to clinical fields, machine learning (ML) models such as artificial neural networks
(ANNs) represent a form of artificial intelligence that is particularly suited for
preoperative medical risk stratification and resource allocation. ML models have demonstrated
high accuracy in predicting SSI following lumbar spinal fusion.[9] However, there is a paucity of studies utilizing ML models for the prediction of
SSI following TKA. Therefore, this study aimed to develop and validate ML models for
the prediction of SSI in patients following primary TKA.
Materials and Methods
Patient Cohort
This present study was approved by the Institutional Review Board. A retrospective
review of 10,089 primary TKA procedures was performed. All TKA surgeries were performed
between 2016 and 2019 at our tertiary academic center. All TKA surgeries were performed
by a total of 11 fellowship-trained arthroplasty surgeons. Patients with simultaneous
bilateral surgery, partial joint arthroplasty, and missing perioperative data were
excluded from the analysis. All patients had a minimum follow-up of 2 years (range:
range: 2.1–4.7 years). A total of 10,021 primary TKA patients remained for the development
and validation of ML algorithms to predict SSI following primary TKA. The primary
outcome of interest in this study was the prediction of superficial and deep SSIs
in patients following primary TKA. SSI was defined in concordance with the criteria
of the Musculoskeletal Infection Society.[10]
Clinical Variables
Using our institution's electronic medical record system for patient chart review,
patient and procedural variables associated with the development of SSI following
TKA were collected. Collected patient variables included age, gender, body mass index
(BMI), ethnicity, insurance status (Medicare, Medicaid, and Private), social status,
American Society of Anesthesiologist Physical Status score (ASA score), medical comorbidities,
Charlson comorbidity index (CCI), and preoperative medications ([Table 1]). Procedural variables included for analysis involved laterality, indication for
primary TKA, prior injections/surgeries on the knee, prior ambulatory/inpatient stays,
anesthesia type, tranexamic acid usage, component fixation method (cemented vs. non-cemented),
blood loss, transfusion rates, and tourniquet use.
Table 1
Baseline characteristics of study population
|
Characteristic
|
Primary TKA patients
(N = 10,021)
|
|
Demographics
|
|
Age (y)
|
74.2 ± 22.7
|
|
Gender
|
3,992 males; 6,029 females
|
|
BMI (kg/m2)
|
32.3 ± 6.4
|
|
Laterality
|
4,727 left; 5,294 right
|
|
ASA 1—616 (6.1%)
|
|
ASA score (%)
|
ASA 2—6,168 (61.5%)
|
|
ASA 3—3,079 (30.2%)
|
|
ASA 4—226 (2.2%)
|
|
Charlson comorbidity index
|
1.9 ± 1.5
|
|
Insurance status (Medicare; Medicaid; Private)
|
1,847; 561; 7,613
|
|
Ethnicity (White, African American, Hispanic, and Asian)
|
9,686, 163, 112, 60
|
|
Follow-up time (y)
|
2.8 ± 1.1
|
|
Comorbidities
|
|
Smoking (%)
|
502 (5.0%)
|
|
Drinking (%)
|
1,413 (14.1%)
|
|
Drug abuse (%)
|
173 (0.1%)
|
|
Diabetes mellitus (%)
|
710 (7.8%)
|
|
Depression (%)
|
594 (6.0%)
|
|
Renal failure (%)
|
438 (4.8%)
|
|
Malignant tumor (%)
|
819 (8.1%)
|
|
Hypertension
|
4,648 (46.3%)
|
|
Surgical variables
|
|
Blood loss (mL)
|
109.0 ± 96.0
|
|
Operation time (min)
|
78.9 ± 32.6
|
|
Spinal anesthesia (%)
|
82.8
|
|
Tranexamic acid usage (%)
|
77.1
|
|
Tourniquet use (%)
|
94.1
|
|
Transfusion rates (%)
|
3.8
|
|
Cemented component fixation (%)
|
96.2
|
|
Indication for primary TKA (osteoarthritis)
|
92.5
|
Abbreviations: ASA, American Society of Anesthesiologist Physical Status score; BMI,
body mass index; TKA, total knee arthroplasty.
Model Development
For the SSI classification analysis, we employed five state-of-the-art supervised
ML methods: (1) ANN, (2) stochastic gradient boosting, (3) support vector machines
(SVMs), (4) random forest (RF), and (5) elastic-net penalized logistic regression.
These ML methods were selected based on prior studies showing the potency of these
modeling techniques to accurately predict arthroplasty patient outcomes. The dataset
underwent a random division into two groups using an 80:20 stratified split ratio,
which resulted in a training dataset (8,016 TKAs) and a testing dataset (2,005 TKAs).
Recursive feature elimination was used to select the subset of parameters for final
modeling. Five-fold cross-validation was performed five times to develop and assess
all candidate models.
The five ML models were assessed using the area under the receiver operating characteristic
curve (AUC). An AUC of 1 represents a perfect ML model, while ML models no better
than chance have an AUC of 0.5.[8] ML model calibration was achieved through the use of a calibration plot. Overall
model performance was assessed using the Brier score.[11] Perfect ML models have a Brier score of 0. Decision curve analysis was performed
to measure the expected utility of TKA candidate model predictions if clinical management
was to change based on ML model predictions. The interpretability of ML models was
performed at both global and local levels.
Statistical Analysis
All statistical analysis was performed using SPSS (SPSS version 18.0; IBM Corp., Armonk,
NY), Matlab (MathWorks Inc., Natick, MA), Python (Python Software Foundation, Wilmington,
DE), and Anaconda (Anaconda Inc., Austin, TX).
Results
A total of 10,021 patients who underwent primary TKA were analyzed. Of those 10,021
patients, 404 (4.0%) incidences of SSI were observed at an average follow-up of 2.8 ± 1.1
years, including 223 superficial SSI as well as 181 PJIs. The mean age of the patient
cohort was 74.2 ± 22.7 years, with a mean BMI of 32.3 ± 6.4 kg/m2. Patient demographics and surgical variables for the TKA patient cohort are summarized
in [Table 1]. The causative pathogens for PJIs are summarized in [Table 2]. On cross-validation of the training set, the AUCs of the candidate models ranged
from 0.78 for SVMs to 0.84 for ANNs ([Table 3]). The calibration intercept ranged from −0.18 to 0.17, with the best intercept for
ANNs (intercept of 0.07; [Table 3]). The lowest Brier score error was achieved by ANNs (Brier score of 0.054). In the
testing set, the AUCs of the five candidate models ranged from 0.78 to 0.84 ([Table 4]). The highest AUC was achieved by ANNs (AUC = 0.84; [Table 4]). The Brier score errors in the testing set varied between 0.054 and 0.056, with
the lowest Brier score error for ANNs. The accuracy of the five ML models exceeded
94%.
Table 2
Causative pathogens for the development of surgical site infection following primary
total knee arthroplasty
|
Causative pathogen
|
Revision surgery for SSI (N = 181)
|
|
Unfavorable
|
|
Methicillin-resistant Staphylococcus aureus (MRSA)
|
10 (5.5%)
|
|
Pseudomonas aeruginosa
|
8 (4.4%)
|
|
Anaerobes
|
10 (5.5%)
|
|
Negative culture
|
29 (16.0%)
|
|
Other gram-negative organisms
|
13 (7.1%)
|
|
Mixed growth
|
20 (11.0%)
|
|
Favorable
|
|
Streptococcus species
|
17 (9.4%)
|
|
Staphylococcus species
|
15 (8.3%)
|
|
Coagulase-negative Staphylococci
|
9 (4.9%)
|
|
Other gram-positive organisms
|
14 (7.7%)
|
|
Propionibacterium acnes
|
8 (4.4%)
|
|
Staphylococcus aureus
|
23 (12.7%)
|
|
Other
|
5 (2.7%)
|
Abbreviation: SSI, surgical site infection.
Table 3
Discrimination and calibration of machine learning algorithms on the training set
for TKA patients
|
Metric
|
Artificial neural
network
|
Stochastic
gradient boosting
|
Support vector
machine
|
Random forest
|
Elastic-net
penalized
logistic
regression
|
|
AUC
|
0.85
|
0.80
|
0.79
|
0.81
|
0.80
|
|
(0.82–0.82)
|
(0.77–0.83)
|
(0.77–0.81)
|
(0.79–0.84)
|
(0.77–0.83)
|
|
Intercept
|
0.07
|
0.16
|
−0.18
|
−0.12
|
0.17
|
|
(−0.01 to 0.15)
|
(−0.05 to 0.37)
|
(−0.30 to −0.06)
|
(−0.20 to −0.04)
|
(−0.02 to 0.36)
|
|
Slope
|
1.03
|
1.22
|
1.11
|
0.85
|
1.09
|
|
(0.91 to 1.15)
|
(1.07 to 1.37)
|
(1.01 to 1.21)
|
(0.75 to 0.95)
|
(1.04 to 1.14)
|
|
Brier
|
0.054
|
0.056
|
0.056
|
0.056
|
0.055
|
|
(0.053 to 0.056)
|
(0.055 to 0.057)
|
(0.054 to 0.057)
|
(0.055 to 0.058)
|
(0.054 to 0.056)
|
Abbreviations: AUC, area under the receiver operating characteristic curve; TKA, total
knee arthroplasty.
Note: Data was expressed as mean (95% confidence interval). Null model Brier score = 0.058.
Table 4
Discrimination and calibration of machine learning algorithms on the testing set for
TKA patients
|
Metric
|
Artificial neural
network
|
Stochastic gradient
boosting
|
Support vector
machine
|
Random
forest
|
Elastic-net
penalized logistic
regression
|
|
AUC
|
0.84
|
0.79
|
0.78
|
0.80
|
0.80
|
|
Intercept
|
0.09
|
0.18
|
−0.21
|
−0.17
|
0.18
|
|
Slope
|
1.06
|
1.27
|
1.15
|
0.90
|
1.10
|
|
Brier
|
0.054
|
0.055
|
0.056
|
0.055
|
0.054
|
Abbreviations: AUC, area under the receiver operating characteristic curve; TKA, total
knee arthroplasty.
Note: Data was expressed as mean (95% confidence interval). Null model Brier score = 0.059.
Decision curve analysis showed a higher net benefit for all five ML models, when compared
with the default strategies of changing management for all patients or no patients.
Variables significantly associated with the development of SSI were old age (>75 years),
male gender, CCI, smoking, alcohol use, diabetes, Medicare insurance, and BMI ([Fig. 1]). The strongest predictors of SSI were CCI, BMI (>30 kg/m2), and smoking ([Fig. 2]). Numerous medical comorbidities demonstrated only a small impact on the development
of SSI following primary TKA: drug abuse (6.4%), depression (5.3%), renal failure
(5.1%), malignant tumor (3.8%), and hypertension (1.2%). In terms of model performance,
there was no significant difference for ML candidate models predictions between superficial
SSI and PJIs considering the AUC (p = 0.35), calibration intercept (p = 0.47), calibration slope (p = 0.51), and Brier score (p = 0.44; [Fig. 3]).
Fig. 1 Machine learning model for the prediction of surgical site infection following primary
total knee arthroplasty.
Fig. 2 Global variable importance plot for the prediction of surgical site infection following
primary TKA. TKA, total knee arthroplasty.
Fig. 3 Calibration plot for the neural network model for the prediction of surgical site
infection following primary TKA. TKA, total knee arthroplasty.
An example of a local, individual patient-level explanation for the model predictions
by ANN is shown in [Fig. 4]. For a 63-year old obese (BMI: 41 kg/m2) TKA patient with CCI of 3.03, diabetes, and Medicare insurance, who had no history
of alcohol and smoking, the predicted probability of SSI following primary TKA was
16.3%. A high CCI, high BMI (> 35 m/kg2), diabetes, and Medicaid insurance status increased the probability of SSI, whereas
age, no prior history of alcohol use, and smoking decreased the probability of SSI.
Fig. 4 Example of individual patient-specific explanation generated by the neural network
model for TKA patients. TKA, total knee arthroplasty.
Discussion
Due to the growing attention to predict SSIs following TKA and to optimize risk factors
preoperatively,[12] there is an increasing interest in applying ML models to TKA patient care with regard
to postsurgical infections. In this present study, we demonstrate excellent performance
for all five ML candidate models on discrimination, calibration, and decision curve
analysis in terms of predicting superficial and deep SSIs in patients following primary
TKA, with the ANN demonstrating the strongest performance of all ML models (AUC = 0.84).
Prior works aiming to predict SSI following hip and knee TJA did not achieve excellent
model performance. A retrospective database study by Inacio et al[13] used a prescription-based comorbidity measure to predict PJI within 90 days following
hip and knee TJA, reporting an AUC of less than 0.63 for their modeling techniques.
Similarly, Shah et al did not achieve excellent model predictions (AUC = 0.73) in
their ML study, which intended to predict postoperative complications such as SSI
in patients following primary hip and knee TJA.[14] Our current ML models additionally show a higher AUC, when compared with recent
ML models for the prediction of SSIs following spinal fusion (AUC = 0.77) and neurological
operations (AUC = 0.76).[9]
[15] Based on the high accuracy of our ANN models, when compared with prior literature,
the presented ANN models have the potential to be used in real-time patient-specific
SSI prediction in primary TKA patients.
Based on the results of the presented ML models, the strongest predictors of superficial
and deep SSIs following primary TKA were CCI, obesity (BMI >30 kg/m2), and smoking. The model performance did not significantly differ between predictions
for superficial and deep SSIs, probably due to the fact that prior systematic reviews
showed strong concurrence in terms of risk factors for superficial and deep SSIs.[16] The CCI, the most influential predictor of SSIs, initially designed to predict mortality,
is a highly validated mechanism for quantifying patient comorbidities.[17] More recently, the CCI has been studied as a predictive tool for various events
following primary TJA, including complications,[18] readmission rates,[19] functional outcomes,[20] discharge disposition,[21] and prolonged length of stay.[19] Additionally, the CCI has been shown to be the predictor of greater hospital charges
and costs associated with TKA.[22] Furthermore, several previous studies support our observation that a higher number
of comorbidities, identified through the CCI, were associated with a higher risk of
SSI.[7]
[13]
From this present ANN model, obesity (BMI >30 kg/m2 defined as per recommendations of the US Center for Disease Control[23]) was the second strongest predictor of superficial and deep SSIs following primary
TKA. Several meta-analyses have analyzed the relationship between SSI and obesity.[24] Although there is no clear cutoff value, a higher BMI is known to be associated
with an increased risk of SSI following TKA, indicating that BMIs greater than 40 kg/m2 are strongly correlated with SSIs.[25] For this, a plausible explanation may lie in the observations of previous literature
which demonstrated that obese TKA patients had increased blood loss, longer surgical
times, increased comorbid conditions, and prolonged postoperative wound drainage.[26]
[27] In terms of smoking as a strong predictor of SSI following index TKA, similar to
the findings of this present study, a large database study by Kremers et al has provided
comparable outcomes with an increased risk of SSI following primary THA and TKA in
smokers.[28] Several prior works reported that approximately 7% of TKA patients were current
tobacco users,[29] and smoking placed patients at a 3.5% increased risk of SSI following TKA, when
compared with non-smoking TKA patients.[30] Furthermore, the most common cause of revision TKA in smokers is PJI.[31]
There is a strong agreement in terms of risk factors of SSI development following
primary TKA between the present ML study and prior retrospective work.[26]
[28] However, the present study identified increased importance of obesity than previously
reported.[8]
[32] With previous studies reporting a risk of 1–2% for the development of PJI solely
due to obesity,[32] this present ML study shows a greater significance of a high BMI (3.8%). This may
be based on the increased accuracy of data analysis as provided by ML algorithms,
which possess the ability to accurately identify complex relationships between clinical
variables, even in noisy and incomplete datasets.[33] Furthermore, ML algorithms provide risk factor estimates within seconds, thereby
providing a viable tool to assist clinical decision-making. Consequently, ML tools
have seen a rapid rise over recent years in many fields of medicine. The use of ML
algorithms in clinical environments has the potential to support clinical decision-making
through a data-driven driven approach that provides highly accurate results in real
time. Therefore, it can overcome prior approaches that solely rely on the experience
of orthopaedic surgeons, which may be of benefit especially in patients with complex
medical history. However, there may be concerns associated with the use of ML technology
regarding the limited access to health care institutions for patients at the risk
of postoperative complications. This is based on the current reimbursement models,
where patients with an increased amount of postoperative complications are less profitable
to the health care institutions; therefore, these patients may face challenges to
access health care providers.
The results of the current study provide practical information that may be clinically
useful for the preoperative identification of patients with a high probability of
superficial or deep SSI following primary TKA, solely predicted by ML models from
patient demographic data and medical comorbidities. To be used as a reliable predictive
tool in clinical practice, the ability to accurately classify the good and the poor
prognosis is required. Since the occurrence of SSI is strongly associated with poor
outcomes following primary TKA, this stratification using ANN models could be helpful
to arthroplasty surgeons and patients during pre-operative counseling and patient
optimization based on pre-operative patient data. Additionally, for high-risk TKA
patients, determined through their estimated risk for SSI, an extra-preventative health
care resource could be applied to optimize modifiable risk factors to minimize the
risk of SSI. These extra-preventative health care resources may include preoperative
rehabilitation programs as well as preoperative counseling and educational seminars
to optimize TKA patients prior to surgery.
The present study has several potential limitations. First, this study has inherent
disadvantages of a retrospective study design such as bias and an inability to control
confounding factors. Second, although a large number of primary TKA patients were
included in this single-institution study, the number of patients with postoperative
SSI was still relatively small. Selection bias from a limited database, an inseparable
limitation of ML models, can render poor prognostic generalizability of the presented
ML models. To address this selection bias, larger datasets sampling patients with
a broad demographic spectrum may be needed in future studies. Nonetheless, the presented
five ML algorithms demonstrated an accuracy of greater than 94%, highlighting the
strong predictive ability of these computational tools. Third, the ML models were
only validated internally, which may limit the generalizability in clinical practice
as there may be differences in the patient population between our tertiary referral
center and alternative health care providers across the country. Finally, most of
the patient risk factors evaluated in this study were listed as binary including the
presence of depression or diabetes or alcohol consumption; thus, the effect of disease
severity was not evaluated in this study. However, similar limitations were reported
in prior retrospective studies on this topic.[34]
[35]
[36]
In conclusion, this study developed and validated five ML models for the prediction
of patient-specific SSI following primary TKA. The study findings show excellent model
performance, with the best modeling accuracy of ANNs. This highlights the potential
of these computational models to assist in preoperative patient optimization and counseling
to maximize outcomes in TKA patients.