Introduction: Thrombophilia diagnosis requires the analysis of clinical data, specialized laboratory
testing, and advanced clinical decision-making. The diagnostic approach can differ
based on individual clinical practice, institutional guidelines, and available resources,
all of which may influence patient outcomes. This variability presents an opportunity
for optimization with AI-powered solutions.
Method: We have implemented a novel machine learning algorithm to replicate the process of
thrombophilia diagnosis. This approach is an extension of XGBoost, a gradient boosting
algorithm for supervised learning. The new algorithm, XGBOrdinal, was specifically
designed for ordinal classification tasks, transforming them into a series of binary
classification problems. Input data included 14 clinical and 26 laboratory data parameters
retrospectively collected from the electronic medical record of patients with suspected
hereditary thrombophilia who were referred to our ambulatory hemostasis clinic from
11/2019 to 2/2024. Patients with malignancy, arterial thrombosis, thrombosis within
6 weeks of testing, and suspected or confirmed antiphospholipid syndrome were excluded
from this analysis. The target variables consisted of 1) the venous thromboembolism
(VTE) risk categories: not increased, slightly increased, and increased; 2) diagnosis:
thrombophilia excluded, low risk thrombophilia, and intermediate/high risk thrombophilia;
and 3) treatment recommendation: routine prophylaxis, routine prophylaxis extended
to low-risk situations, extended or indefinite anticoagulation at low dose, and indefinite
anticoagulation at full dose.
Results: 454 patients were included in the study. A comparison of the relative contribution
of selected clinical and laboratory input features to the overall performance of the
algorithm for all three target variables is shown in the Figure. Notably, the age
at the first incidence of thrombosis was of significant importance to the algorithm
for all three target features, outperforming the second-most important feature compared
to all 40 input features by over 4-fold for all three target variables ([Fig. 2 ]). The accuracy of the model in terms of sensitivity, specificity, and precision
for all three target values is shown in the [Fig. 1 ].
Fig. 2 Comparative partial feature gain metrics for all three target variables showing
the relative contribution of selected clinical and laboratory features to overall
performance of the algorithm. Of note, when compared to all 40 input parameters, the
partial feature gain for age at first thrombosis was 4.4, 5.9, and 4.6 times higher
than the second-most important parameter for VTE risk, diagnosis, and treatment recommendation,
respectively.
Fig. 1 Average sensitivity, specificity, and precision for venous thromboembolism (VTE)
risk, diagnosis, and treatment recommendation.
Conclusion: Our findings demonstrate that XGBOrdinal, a novel, locally developed algorithm, which
is an extension of XGBoost and designed for classification of ordinal datasets, can
accurately replicate the clinical thought process for hereditary thrombophilia diagnosis,
treatment recommendation, and VTE risk stratification, particularly with an error
tolerance of±1. In addition, the relative importance of clinical and laboratory features
to the algorithm may allow for a streamlined diagnostic process in the future. This
highlights the potential for optimization and standardization of thrombophilia diagnosis
and thrombosis risk assessment on a larger scale. We expect the performance to improve
as the sample size increases.