Int J Angiol
DOI: 10.1055/a-2644-4444
Original Article

Machine Learning-Based Ensemble Predictive Model for Cardiovascular Disease Prevention

Neeraj Kumar
1   Data Centre Unit, Indian Council of Medical Research Headquarters, New Delhi, India
,
Rekha Agarwal
2   Amity Institute of Information Technology, Amity University, Noida, Uttar Pradesh, India
,
Lokesh Kumar Sharma
3   Department of Information Technology, Indian Council of Medical Research, National Institute of Occupational Health, Gujarat, Ahmadabad, India
,
Rashmi Vashisth
2   Amity Institute of Information Technology, Amity University, Noida, Uttar Pradesh, India
› Author Affiliations

Funding None.
Preview

Abstract

Cardiovascular diseases (CVDs) are a primary cause of death globally, with an increasing incidence in India. Machine learning (ML) has emerged as a viable approach for CVD prediction; however, dataset size and generalizability limit model robustness. This study aims to develop an enhanced ML prediction model for CVD detection using ensemble methods. Six datasets were considered, including 7,916 records with clinical parameters. The records were classified into Dataset 1 (n = 3,676) and Dataset 2 (n = 4,240) based on available features to establish a feature set. Dataset 1 underwent analysis utilizing two approaches: binary classification of target variable (0: absence of CVD, 1: presence of CVD) and multiclass classification of target variable (based on CVD severity). Likewise, Dataset 2 underwent further analysis using binary classification of target variable (risk of CVD in 10 years). Identical data preprocessing and exploratory data analysis steps were performed for both dataset groups. Subsequently, 18 ML algorithms were used to develop distinct models for both dataset groups, from which LazyPredict picked the top 10 performing models. The Voting Classifier was used to build an ensemble model to integrate the models and enhance predictive performance. In the case of Dataset 1, our framework was obtained an accuracy of 96.5% in binary classification and 85.5% in multiclass classification. Similarly, our framework achieved an accuracy of 81.18% for Dataset 2. Utilizing ensemble modeling and an extensive dataset, our framework surpasses traditional and existing ML models in predicting stability, mitigating bias and improving decision support in CVD detection.

Authors' Contributions

N.K., R.A., and R.V. were involved in the conceptualization of the study. Methodology was developed by N.K., R.A., and L.K.S. Formal analysis and investigation were carried out by N.K., L.K.S., and R.V. N.K. prepared the original draft of the manuscript, while R.A. and R.V. contributed to reviewing and editing the draft. N.K. and R.A. provided the necessary resources, and R.A. supervised the overall preparation of the manuscript..


Data Availability Statement

The data used for this study are detailed in the publication, accompanied by references and supplementary material.


Supplementary Material



Publication History

Article published online:
14 July 2025

© 2025. International College of Angiology. This article is published by Thieme.

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA