Research Article

Comparative Evaluation of Ensemble Machine Learning Approaches for Heart Disease Prediction

DOI:

10.3791/70124

April 10th, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This protocol outlines a computational process to create and assess ensemble machine learning models for heart disease prediction using publicly accessible benchmark data within a reproducible preprocessing and evaluation structure.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This paper presents a computational bench-marking assessment of Ensemble Learning algorithms in the prediction of heart disease, combining different Machine Learning algorithms, such as hard voting, soft voting, and stacking, in a single framework. The evaluation was conducted using publicly available cardiovascular dataset obtained from the Kaggle repository (https://www.kaggle.com/datasets/sid321axn/heart-statlog-cleveland-hungary-final) comprising 1,190 instances and 11 clinical features. The process involves data preprocessing, which includes handling missing values, removing outliers, scaling variables and class balancing to ensure uniform input feature selection, based on Random Forest (RF), is used to eliminate unnecessary features. Among the evaluated models, the stacking ensemble classifier achieved the highest overall accuracy of 91.88% on the test dataset. Although additional metrics such as precision, recall and F1-score were computed for comparative analysis, the emphasis of this study remains on methodological benchmarking rather than clinical validation.

Various base classifiers, including Decision Tree, Random Forest, AdaBoost, and XGBoost, are applied and tested independently. These models are then combined using ensemble techniques with hard voting, soft voting, and stacking. In stacking, Logistic Regression is used as the meta-model, which is trained on cross-validated predictions of the out-of-fold samples to avoid overfitting.

Evaluations are carried out using accuracy as the primary criterion for comparison, so that individual classification systems and their combination strategies can be compared uniformly in the same preprocessing and validation environment. Though performance metrics are provided for comparative indications, the emphasis of the approach lies in the development and evaluation of strategies and not in their clinical assessment.

This protocol makes it easy to compare ensemble machine learning algorithms on publicly available cardiovascular datasets and helps to make a systematic comparison of data preprocessing and ensemble configuration approaches.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Publicly available cardiovascular disease datasets are widely used as benchmark problems in machine learning research for evaluating classification algorithms and predictive modelling techniques1,2,3. Such datasets, which contain clinical and demographic attributes, provide a standardized and reproducible basis for comparing preprocessing strategies, feature selection methods, and ensemble learning architectures under controlled experimental conditions. Consequently, they are commonly employed to assess algorithmic behaviour rather than to support clinical inference or real-w....

Access restricted. Please log in or start a trial to view this content.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This protocol described a reproducible computational workflow for benchmarking ensemble machine learning models using a publicly available cardiovascular disease dataset.

Selection of the dataset
The study employed a publicly available cardiovascular disease dataset obtained from the Kaggle repository. The dataset comprised 1190 instances and included 11 features. For model training, 80% of the data was utilized, while the remaining 20% was allocated for performance evaluation. Table 1 presented a detailed description of the dataset features. The dataset was loaded into Python (version 3.9) using the pa....

Access restricted. Please log in or start a trial to view this content.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This section presents the effects of data preprocessing and feature selection, compares the performance of individual and ensemble classifiers and summarizes benchmarking outcomes across standard evaluation metrics. Data preprocessing, including missing value removal, class balancing and feature scaling, produced consistent input distributions across all classifiers. Models trained on preprocessed data exhibited reduced variability in performance across repeated 5-fold cross-validation runs and train–test splits compared.......

Access restricted. Please log in or start a trial to view this content.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study demonstrates that stacking-based ensemble learning consistently achieved superior and more stable classification performance compared with individual classifiers and voting-based ensembles under standardized preprocessing and benchmarking conditions.

The consistency of ensemble performance observed in this study reinforces the importance of methodological rigor in comparative machine learning research19,20,

Access restricted. Please log in or start a trial to view this content.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors declare that they have no conflicts of interest related to this research work. No financial, personal, or professional relationships have influenced the results, analysis, or conclusions presented in this manuscript.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors acknowledge the use of publicly available datasets and open-source software resources that supported this study. The authors also thank their respective institutions, including Sri Sri University, Bhubaneswar, and SOA University, Bhubaneswar, for providing the academic environment and research facilities necessary to conduct this work.

....

Access restricted. Please log in or start a trial to view this content.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
AdaBoostClassifierscikit-learn DevelopersN/AEnsemble boosting classifier used for benchmarking
Jupyter NotebookProject JupyterN/AComputational notebook environment
Kaggle Heart Statlog (Cleveland–Hungary) DatasetKaggleN/APublic cardiovascular dataset (1190 instances, 11 features). URL: https://www.kaggle.com/datasets/sid321axn/heart-statlog-cleveland-hungary-final
LogisticRegressionscikit-learn DevelopersN/AMeta-classifier used in stacking ensemble
MatplotlibMatplotlib Development TeamN/AData visualization library
NumPyNumPy DevelopersN/ANumerical computation library
pandas (Version 1.5.3)pandas Development TeamN/AData preprocessing and handling
Python (Version 3.9)Python Software FoundationN/AProgramming environment used for implementation
RandomForestClassifierscikit-learn DevelopersN/ABase classifier and feature importance computation
SeabornSeaborn Development TeamN/AHeatmap visualization of correlation matrix
StandardScalerscikit-learn DevelopersN/AFeature scaling function
VotingClassifierscikit-learn DevelopersN/AHard and soft voting ensemble implementation
XGBoostClassifierDMLCN/AGradient boosting classifier used as base learner

References

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,
  1. Libby, P., Bonow, R. O., et al. Braunwald’s Heart Disease: A Textbook of Cardiovascular Medicine. Elsevier Health Sci. 9, (2011).
  2. Nayak, O., Pallapothala, T., Gupta, G. P. Heart disease prediction framework using soft voting-based ensemble learning techniques. Convergence of Big Data Technologies and Computational Intellig....

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Ensemble LearningHeart Disease PredictionMachine Learning AlgorithmsStacking ClassifierHard VotingSoft VotingRandom ForestData PreprocessingFeature SelectionCardiovascular Dataset

Related Articles