Subscribe to RSS
DOI: 10.1055/a-2644-4444
Machine Learning-Based Ensemble Predictive Model for Cardiovascular Disease Prevention
Funding None.

Abstract
Cardiovascular diseases (CVDs) are a primary cause of death globally, with an increasing incidence in India. Machine learning (ML) has emerged as a viable approach for CVD prediction; however, dataset size and generalizability limit model robustness. This study aims to develop an enhanced ML prediction model for CVD detection using ensemble methods. Six datasets were considered, including 7,916 records with clinical parameters. The records were classified into Dataset 1 (n = 3,676) and Dataset 2 (n = 4,240) based on available features to establish a feature set. Dataset 1 underwent analysis utilizing two approaches: binary classification of target variable (0: absence of CVD, 1: presence of CVD) and multiclass classification of target variable (based on CVD severity). Likewise, Dataset 2 underwent further analysis using binary classification of target variable (risk of CVD in 10 years). Identical data preprocessing and exploratory data analysis steps were performed for both dataset groups. Subsequently, 18 ML algorithms were used to develop distinct models for both dataset groups, from which LazyPredict picked the top 10 performing models. The Voting Classifier was used to build an ensemble model to integrate the models and enhance predictive performance. In the case of Dataset 1, our framework was obtained an accuracy of 96.5% in binary classification and 85.5% in multiclass classification. Similarly, our framework achieved an accuracy of 81.18% for Dataset 2. Utilizing ensemble modeling and an extensive dataset, our framework surpasses traditional and existing ML models in predicting stability, mitigating bias and improving decision support in CVD detection.
Keywords
cardiovascular disease - machine learning - ensemble learning - early detection - risk stratificationAuthors' Contributions
N.K., R.A., and R.V. were involved in the conceptualization of the study. Methodology was developed by N.K., R.A., and L.K.S. Formal analysis and investigation were carried out by N.K., L.K.S., and R.V. N.K. prepared the original draft of the manuscript, while R.A. and R.V. contributed to reviewing and editing the draft. N.K. and R.A. provided the necessary resources, and R.A. supervised the overall preparation of the manuscript..
Data Availability Statement
The data used for this study are detailed in the publication, accompanied by references and supplementary material.
Publication History
Article published online:
14 July 2025
© 2025. International College of Angiology. This article is published by Thieme.
Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA
-
References
- 1 Mettananda C, Solangaarachchige M, Haddela P. et al. Comparison of cardiovascular risk prediction models developed using machine learning based on data from a Sri Lankan cohort with World Health Organization risk charts for predicting cardiovascular risk among Sri Lankans: a cohort study. BMJ Open 2025; 15 (01) e081434
- 2 Huang C, Shu S, Zhou M, Sun Z, Li S. Development and validation of an interpretable machine learning model for predicting left atrial thrombus or spontaneous echo contrast in non-valvular atrial fibrillation patients. PLoS One 2025; 20 (01) e0313562
- 3 Goffart S, Delingette H, Chierici A. et al. Artificial intelligence techniques for prognostic and diagnostic assessments in peripheral artery disease: a scoping review. Angiology 2025; 33197241310572
- 4 Narad P, Kumar A, Chakraborty A. et al. Transcription factor information system (TFIS): a tool for detection of transcription factor binding sites. Interdiscip Sci 2017; 9 (03) 378-391
- 5 Sarma D, Rali AS, Jentzer JC. Key concepts in machine learning and clinical applications in the cardiac intensive care unit. Curr Cardiol Rep 2025; 27 (01) 30
- 6 Abdullah M. Artificial intelligence-based framework for early detection of heart disease using enhanced multilayer perceptron. Front Artif Intell 2025; 7: 1539588
- 7 Bhardwaj A. Framingham heart study dataset. 2022 . Accessed February 7, 2025 at: https://www.kaggle.com/datasets/aasheesh200/framingham-heart-study-dataset
- 8 Wong ND, Budoff MJ, Ferdinand K. et al. Atherosclerotic cardiovascular disease risk assessment: an American Society for Preventive Cardiology clinical practice statement. Am J Prev Cardiol 2022; 10: 100335
- 9 Kasartzian DI, Tsiampalis T. Transforming cardiovascular risk prediction: a review of machine learning and artificial intelligence innovations. Life (Basel) 2025; 15 (01) 94
- 10 Liu T, Krentz A, Lu L, Curcin V. Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis. Eur Heart J Digit Health 2024; 6 (01) 7-22
- 11 Sengupta A, Naresh G, Mishra A, Parashar D, Narad P. Proteome analysis using machine learning approaches and its applications to diseases. In: Advances in Protein Chemistry and Structural Biology. Vol 127.. Elsevier; 2021: 161-216
- 12 Seringa J, Hirata A, Pedro AR, Santana R, Magalhães T. Health care professionals and data scientists' perspectives on a machine learning system to anticipate and manage the risk of decompensation from patients with heart failure: qualitative interview study. J Med Internet Res 2025; 27: e54990
- 13 Tiwari A, Chugh A, Sharma A. Ensemble framework for cardiovascular disease prediction. Comput Biol Med 2022; 146: 105624
- 14 El Bialy R, Salama MA, Karam O. An ensemble model for Heart disease data sets: a generalized model. In: Proceedings of the 10th International Conference on Informatics and Systems. ACM; 2016: 191-196
- 15 Weng WH, Baur S, Daswani M. et al. Predicting cardiovascular disease risk using photoplethysmography and deep learning. 2023
- 16 Miao KH, Miao JH, Miao GJ. Diagnosing coronary heart disease using ensemble machine learning. IJACSA 2016; 7 (10)
- 17 Bashir S, Qamar U, Khan FH, Javed MY. MV5: a clinical decision support framework for heart disease prediction using majority vote based classifier ensemble. Arab J Sci Eng 2014; 39 (11) 7771-7783
- 18 Atallah R, Al-Mousa A. Heart disease detection using machine learning majority voting ensemble method. In: 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS). IEEE; 2019: 1-6
- 19 Cherngs. Heart Disease Cleveland UCI. Kaggle. 2020. Accessed February 7, 2025 at: https://www.kaggle.com/datasets/cherngs/heart-disease-cleveland-uci
- 20 Sony R. UCI Heart Disease Data. Kaggle. 2021. Accessed February 7, 2025 at: https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data
- 21 Statlog (Heart) - UCI ML Repository. Statlog (Heart)
- 22 Prakash B, Bhattacharyya D. Cardiovascular_Disease_Dataset. Mendeley Data 2021; 1
- 23 Siddhartha M. Heart disease dataset (comprehensive). November 5, 2020
- 24 Liu C, Springer D, Li Q. et al. An open access database for the evaluation of heart sound algorithms. Physiol Meas 2016; 37 (12) 2181-2213
- 25 Bousseljot RD, Kreiseler D, Schnabel A. The PTB diagnostic ECG database. 2004
- 26 Moody GB, Mark RG. MIT-BIH arrhythmia database. 1992
- 27 The Matplotlib Development Team. Matplotlib: visualization with Python. December 14, 2024
- 28 Waskom M. seaborn: statistical data visualization. JOSS 2021; 6 (60) 3021
- 29 Buuren SV, Groothuis-Oudshoorn K. MICE: multivariate imputation by chained equations in R. J Stat Softw 2011 45. 03
- 30 Pandala SR. shankarpandala/lazypredict. Accessed February 7, 2025 at: https://github.com/shankarpandala/lazypredict
- 31 SMOTE—Version 0.13.0. Accessed February 7, 2025 at: https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html
- 32 scikit-learn: machine learning in Python—scikit-learn 1.6.1 documentation. Accessed February 7, 2025 at: https://scikit-learn.org/stable/
- 33 Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD '16. Association for Computing Machinery; 2016:785–794
- 34 Welcome to LightGBM's documentation!—LightGBM 4.5.0 documentation. Accessed February 7, 2025 at: https://lightgbm.readthedocs.io/en/stable/
- 35 lightgbm.LGBMClassifier—LightGBM 4.5.0.99 documentation. Accessed February 7, 2025 at: https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html
- 36 Python API. Reference—xgboost 2.1.4 documentation. Accessed February 7, 2025 at: https://xgboost.readthedocs.io/en/latest/python/python_api.html
- 37 RandomForestClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
- 38 SVC. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.svm.SVC.html
- 39 ExtraTreesClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
- 40 DecisionTreeClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
- 41 QuadraticDiscriminantAnalysis. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis.html
- 42 ExtraTreeClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.tree.ExtraTreeClassifier.html
- 43 Gaussian NB. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
- 44 AdaBoostClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html