Machine Learning-Based Ensemble Predictive Model for Cardiovascular Disease Prevention

Neeraj Kumar; Rekha Agarwal; Lokesh Kumar Sharma; Rashmi Vashisth

doi:10.1055/a-2644-4444

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00000167.xml

Share / Bookmark

Facebook Linkedin Weibo

Download PDF

Int J Angiol
DOI: 10.1055/a-2644-4444

Original Article

Machine Learning-Based Ensemble Predictive Model for Cardiovascular Disease Prevention

Neeraj Kumar

¹Data Centre Unit, Indian Council of Medical Research Headquarters, New Delhi, India

,

Rekha Agarwal

²Amity Institute of Information Technology, Amity University, Noida, Uttar Pradesh, India

,

Lokesh Kumar Sharma

³Department of Information Technology, Indian Council of Medical Research, National Institute of Occupational Health, Gujarat, Ahmadabad, India

,

Rashmi Vashisth

²Amity Institute of Information Technology, Amity University, Noida, Uttar Pradesh, India

› Author Affiliations
Funding None.

› Further Information

Abstract
Full Text
References
Supplementary Material

Permissions and Reprints

Abstract

Cardiovascular diseases (CVDs) are a primary cause of death globally, with an increasing incidence in India. Machine learning (ML) has emerged as a viable approach for CVD prediction; however, dataset size and generalizability limit model robustness. This study aims to develop an enhanced ML prediction model for CVD detection using ensemble methods. Six datasets were considered, including 7,916 records with clinical parameters. The records were classified into Dataset 1 (n = 3,676) and Dataset 2 (n = 4,240) based on available features to establish a feature set. Dataset 1 underwent analysis utilizing two approaches: binary classification of target variable (0: absence of CVD, 1: presence of CVD) and multiclass classification of target variable (based on CVD severity). Likewise, Dataset 2 underwent further analysis using binary classification of target variable (risk of CVD in 10 years). Identical data preprocessing and exploratory data analysis steps were performed for both dataset groups. Subsequently, 18 ML algorithms were used to develop distinct models for both dataset groups, from which LazyPredict picked the top 10 performing models. The Voting Classifier was used to build an ensemble model to integrate the models and enhance predictive performance. In the case of Dataset 1, our framework was obtained an accuracy of 96.5% in binary classification and 85.5% in multiclass classification. Similarly, our framework achieved an accuracy of 81.18% for Dataset 2. Utilizing ensemble modeling and an extensive dataset, our framework surpasses traditional and existing ML models in predicting stability, mitigating bias and improving decision support in CVD detection.

Keywords

cardiovascular disease - machine learning - ensemble learning - early detection - risk stratification

Authors' Contributions

N.K., R.A., and R.V. were involved in the conceptualization of the study. Methodology was developed by N.K., R.A., and L.K.S. Formal analysis and investigation were carried out by N.K., L.K.S., and R.V. N.K. prepared the original draft of the manuscript, while R.A. and R.V. contributed to reviewing and editing the draft. N.K. and R.A. provided the necessary resources, and R.A. supervised the overall preparation of the manuscript..

Data Availability Statement

The data used for this study are detailed in the publication, accompanied by references and supplementary material.

Supplementary Material

Supplementary Material

Publication History

Article published online:
14 July 2025

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA

References
1 Mettananda C, Solangaarachchige M, Haddela P. et al. Comparison of cardiovascular risk prediction models developed using machine learning based on data from a Sri Lankan cohort with World Health Organization risk charts for predicting cardiovascular risk among Sri Lankans: a cohort study. BMJ Open 2025; 15 (01) e081434

MissingFormLabel
Crossref PubMed Search in Google Scholar
2 Huang C, Shu S, Zhou M, Sun Z, Li S. Development and validation of an interpretable machine learning model for predicting left atrial thrombus or spontaneous echo contrast in non-valvular atrial fibrillation patients. PLoS One 2025; 20 (01) e0313562

MissingFormLabel
Crossref PubMed Search in Google Scholar
3 Goffart S, Delingette H, Chierici A. et al. Artificial intelligence techniques for prognostic and diagnostic assessments in peripheral artery disease: a scoping review. Angiology 2025; 33197241310572

MissingFormLabel
Crossref PubMed Search in Google Scholar
4 Narad P, Kumar A, Chakraborty A. et al. Transcription factor information system (TFIS): a tool for detection of transcription factor binding sites. Interdiscip Sci 2017; 9 (03) 378-391

MissingFormLabel
Crossref PubMed Search in Google Scholar
5 Sarma D, Rali AS, Jentzer JC. Key concepts in machine learning and clinical applications in the cardiac intensive care unit. Curr Cardiol Rep 2025; 27 (01) 30

MissingFormLabel
Crossref PubMed Search in Google Scholar
6 Abdullah M. Artificial intelligence-based framework for early detection of heart disease using enhanced multilayer perceptron. Front Artif Intell 2025; 7: 1539588

MissingFormLabel
Crossref PubMed Search in Google Scholar
7 Bhardwaj A. Framingham heart study dataset. 2022 . Accessed February 7, 2025 at: https://www.kaggle.com/datasets/aasheesh200/framingham-heart-study-dataset

MissingFormLabel
PubMed Search in Google Scholar
8 Wong ND, Budoff MJ, Ferdinand K. et al. Atherosclerotic cardiovascular disease risk assessment: an American Society for Preventive Cardiology clinical practice statement. Am J Prev Cardiol 2022; 10: 100335

MissingFormLabel
Crossref PubMed Search in Google Scholar
9 Kasartzian DI, Tsiampalis T. Transforming cardiovascular risk prediction: a review of machine learning and artificial intelligence innovations. Life (Basel) 2025; 15 (01) 94

MissingFormLabel
PubMed Search in Google Scholar
10 Liu T, Krentz A, Lu L, Curcin V. Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis. Eur Heart J Digit Health 2024; 6 (01) 7-22

MissingFormLabel
Crossref PubMed Search in Google Scholar
11 Sengupta A, Naresh G, Mishra A, Parashar D, Narad P. Proteome analysis using machine learning approaches and its applications to diseases. In: Advances in Protein Chemistry and Structural Biology. Vol 127.. Elsevier; 2021: 161-216

MissingFormLabel
Search in Google Scholar
12 Seringa J, Hirata A, Pedro AR, Santana R, Magalhães T. Health care professionals and data scientists' perspectives on a machine learning system to anticipate and manage the risk of decompensation from patients with heart failure: qualitative interview study. J Med Internet Res 2025; 27: e54990

MissingFormLabel
Crossref PubMed Search in Google Scholar
13 Tiwari A, Chugh A, Sharma A. Ensemble framework for cardiovascular disease prediction. Comput Biol Med 2022; 146: 105624

MissingFormLabel
Crossref PubMed Search in Google Scholar
14 El Bialy R, Salama MA, Karam O. An ensemble model for Heart disease data sets: a generalized model. In: Proceedings of the 10th International Conference on Informatics and Systems. ACM; 2016: 191-196

MissingFormLabel
Search in Google Scholar
15 Weng WH, Baur S, Daswani M. et al. Predicting cardiovascular disease risk using photoplethysmography and deep learning. 2023

MissingFormLabel
Crossref PubMed Search in Google Scholar
16 Miao KH, Miao JH, Miao GJ. Diagnosing coronary heart disease using ensemble machine learning. IJACSA 2016; 7 (10)

MissingFormLabel
Crossref PubMed Search in Google Scholar
17 Bashir S, Qamar U, Khan FH, Javed MY. MV5: a clinical decision support framework for heart disease prediction using majority vote based classifier ensemble. Arab J Sci Eng 2014; 39 (11) 7771-7783

MissingFormLabel
Crossref PubMed Search in Google Scholar
18 Atallah R, Al-Mousa A. Heart disease detection using machine learning majority voting ensemble method. In: 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS). IEEE; 2019: 1-6

MissingFormLabel
Crossref Search in Google Scholar
19 Cherngs. Heart Disease Cleveland UCI. Kaggle. 2020. Accessed February 7, 2025 at: https://www.kaggle.com/datasets/cherngs/heart-disease-cleveland-uci

MissingFormLabel
PubMed
20 Sony R. UCI Heart Disease Data. Kaggle. 2021. Accessed February 7, 2025 at: https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data

MissingFormLabel
PubMed
21 Statlog (Heart) - UCI ML Repository. Statlog (Heart)

MissingFormLabel
Crossref PubMed
22 Prakash B, Bhattacharyya D. Cardiovascular_Disease_Dataset. Mendeley Data 2021; 1

MissingFormLabel
Crossref PubMed Search in Google Scholar
23 Siddhartha M. Heart disease dataset (comprehensive). November 5, 2020

MissingFormLabel
Crossref PubMed Search in Google Scholar
24 Liu C, Springer D, Li Q. et al. An open access database for the evaluation of heart sound algorithms. Physiol Meas 2016; 37 (12) 2181-2213

MissingFormLabel
Crossref PubMed Search in Google Scholar
25 Bousseljot RD, Kreiseler D, Schnabel A. The PTB diagnostic ECG database. 2004

MissingFormLabel
Crossref PubMed Search in Google Scholar
26 Moody GB, Mark RG. MIT-BIH arrhythmia database. 1992

MissingFormLabel
Crossref PubMed Search in Google Scholar
27 The Matplotlib Development Team. Matplotlib: visualization with Python. December 14, 2024

MissingFormLabel
Crossref PubMed
28 Waskom M. seaborn: statistical data visualization. JOSS 2021; 6 (60) 3021

MissingFormLabel
Crossref PubMed Search in Google Scholar
29 Buuren SV, Groothuis-Oudshoorn K. MICE: multivariate imputation by chained equations in R. J Stat Softw 2011 45. 03

MissingFormLabel
PubMed Search in Google Scholar
30 Pandala SR. shankarpandala/lazypredict. Accessed February 7, 2025 at: https://github.com/shankarpandala/lazypredict

MissingFormLabel
PubMed
31 SMOTE—Version 0.13.0. Accessed February 7, 2025 at: https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html

MissingFormLabel
PubMed
32 scikit-learn: machine learning in Python—scikit-learn 1.6.1 documentation. Accessed February 7, 2025 at: https://scikit-learn.org/stable/

MissingFormLabel
PubMed
33 Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD '16. Association for Computing Machinery; 2016:785–794

MissingFormLabel
Crossref PubMed
34 Welcome to LightGBM's documentation!—LightGBM 4.5.0 documentation. Accessed February 7, 2025 at: https://lightgbm.readthedocs.io/en/stable/

MissingFormLabel
PubMed
35 lightgbm.LGBMClassifier—LightGBM 4.5.0.99 documentation. Accessed February 7, 2025 at: https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html

MissingFormLabel
PubMed
36 Python API. Reference—xgboost 2.1.4 documentation. Accessed February 7, 2025 at: https://xgboost.readthedocs.io/en/latest/python/python_api.html

MissingFormLabel
PubMed
37 RandomForestClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

MissingFormLabel
PubMed
38 SVC. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.svm.SVC.html

MissingFormLabel
PubMed
39 ExtraTreesClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html

MissingFormLabel
PubMed
40 DecisionTreeClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

MissingFormLabel
PubMed
41 QuadraticDiscriminantAnalysis. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis.html

MissingFormLabel
PubMed
42 ExtraTreeClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.tree.ExtraTreeClassifier.html

MissingFormLabel
PubMed
43 Gaussian NB. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html

MissingFormLabel
PubMed
44 AdaBoostClassifier. scikit-learn. Accessed February 7, 2025 at: https://scikit-learn/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html

MissingFormLabel
PubMed

Supplementary Material

Supplementary Material

Subscribe to RSS

Share / Bookmark

Machine Learning-Based Ensemble Predictive Model for Cardiovascular Disease Prevention

Abstract

Keywords

Authors' Contributions

Data Availability Statement

Supplementary Material

Publication History

References