Development and evaluation of a novel machine learning algorithm for outpatient thrombophilia diagnosis, management, and venous thrombosis risk stratification

H L McRae; F Kahl; M Kapsecker; H Rühl; S Jonas; B Pötzsch

doi:10.1055/s-0044-1801664

Hämostaseologie, Table of Contents

Hamostaseologie 2025; 45(S 01): S79
DOI: 10.1055/s-0044-1801664

Abstracts

Topics

T-09 Innovation and Novelties

Development and evaluation of a novel machine learning algorithm for outpatient thrombophilia diagnosis, management, and venous thrombosis risk stratification

Authors

H L McRae

¹University Hospital Bonn, Institute for Experimental Hematology and Transfusion Medicine, Bonn, Germany
F Kahl

²University Hospital Bonn, Institute for Digital Medicine, Bonn, Germany
M Kapsecker

²University Hospital Bonn, Institute for Digital Medicine, Bonn, Germany

³Technical University of Munich, TUM School of Computation, Information and Technology, Munich, Germany
H Rühl

¹University Hospital Bonn, Institute for Experimental Hematology and Transfusion Medicine, Bonn, Germany
S Jonas

²University Hospital Bonn, Institute for Digital Medicine, Bonn, Germany
B Pötzsch

¹University Hospital Bonn, Institute for Experimental Hematology and Transfusion Medicine, Bonn, Germany

Abstract

Full Text

Introduction: Thrombophilia diagnosis requires the analysis of clinical data, specialized laboratory testing, and advanced clinical decision-making. The diagnostic approach can differ based on individual clinical practice, institutional guidelines, and available resources, all of which may influence patient outcomes. This variability presents an opportunity for optimization with AI-powered solutions.

Method: We have implemented a novel machine learning algorithm to replicate the process of thrombophilia diagnosis. This approach is an extension of XGBoost, a gradient boosting algorithm for supervised learning. The new algorithm, XGBOrdinal, was specifically designed for ordinal classification tasks, transforming them into a series of binary classification problems. Input data included 14 clinical and 26 laboratory data parameters retrospectively collected from the electronic medical record of patients with suspected hereditary thrombophilia who were referred to our ambulatory hemostasis clinic from 11/2019 to 2/2024. Patients with malignancy, arterial thrombosis, thrombosis within 6 weeks of testing, and suspected or confirmed antiphospholipid syndrome were excluded from this analysis. The target variables consisted of 1) the venous thromboembolism (VTE) risk categories: not increased, slightly increased, and increased; 2) diagnosis: thrombophilia excluded, low risk thrombophilia, and intermediate/high risk thrombophilia; and 3) treatment recommendation: routine prophylaxis, routine prophylaxis extended to low-risk situations, extended or indefinite anticoagulation at low dose, and indefinite anticoagulation at full dose.

Results: 454 patients were included in the study. A comparison of the relative contribution of selected clinical and laboratory input features to the overall performance of the algorithm for all three target variables is shown in the Figure. Notably, the age at the first incidence of thrombosis was of significant importance to the algorithm for all three target features, outperforming the second-most important feature compared to all 40 input features by over 4-fold for all three target variables ([Fig. 2]). The accuracy of the model in terms of sensitivity, specificity, and precision for all three target values is shown in the [Fig. 1].

Fig. 2 Comparative partial feature gain metrics for all three target variables showing the relative contribution of selected clinical and laboratory features to overall performance of the algorithm. Of note, when compared to all 40 input parameters, the partial feature gain for age at first thrombosis was 4.4, 5.9, and 4.6 times higher than the second-most important parameter for VTE risk, diagnosis, and treatment recommendation, respectively.

Fig. 1 Average sensitivity, specificity, and precision for venous thromboembolism (VTE) risk, diagnosis, and treatment recommendation.

Conclusion: Our findings demonstrate that XGBOrdinal, a novel, locally developed algorithm, which is an extension of XGBoost and designed for classification of ordinal datasets, can accurately replicate the clinical thought process for hereditary thrombophilia diagnosis, treatment recommendation, and VTE risk stratification, particularly with an error tolerance of±1. In addition, the relative importance of clinical and laboratory features to the algorithm may allow for a streamlined diagnostic process in the future. This highlights the potential for optimization and standardization of thrombophilia diagnosis and thrombosis risk assessment on a larger scale. We expect the performance to improve as the sample size increases.

Figures

Fig. 2 Comparative partial feature gain metrics for all three target variables showing the relative contribution of selected clinical and laboratory features to overall performance of the algorithm. Of note, when compared to all 40 input parameters, the partial feature gain for age at first thrombosis was 4.4, 5.9, and 4.6 times higher than the second-most important parameter for VTE risk, diagnosis, and treatment recommendation, respectively.

Fig. 1 Average sensitivity, specificity, and precision for venous thromboembolism (VTE) risk, diagnosis, and treatment recommendation.