Research Article

Stacking Ensemble Approach for Predicting Loan Approval Using Machine Learning Techniques

DOI:

10.3791/68832

September 23rd, 2025

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study develops a stacking ensemble model integrating XGBoost, CatBoost (Gradient Boosting Model), LightGBM (Efficient Gradient Boosting Model), AdaBoost, and Extra Trees to predict loan approvals using Kaggle data. Achieving 98% accuracy, it identifies key predictors like income and credit score, promoting fair, efficient decisions on loan approval and/or rejection.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Digital lending and fintech innovations have upended established banking systems, changing financial inclusion and credit availability in nations around the world. This study examines how peer-to-peer (P2P) and digital lending platforms are changing, emphasizing how technologies like artificial intelligence and machine learning are changing the way loans are approved. A thorough study of the literature highlights the opportunities and problems in the digital lending ecosystem, such as algorithmic risk assessment, customer trust, financial exclusion, and regulatory loopholes. This paper suggests a strong machine learning approach that uses a stacking ensemble model to accurately forecast loan approvals in order to address these issues. The data was pre-processed using train-test partitioning, exploratory analysis, and label encoding using a publicly accessible Kaggle dataset that included applicant demographics, financial characteristics, and credit histories. With XGBoost serving as the meta-learner, the ensemble incorporates the Gradient Boosting Model, Efficient Gradient Boosting, AdaBoost, and Extra Trees classifiers as base learners. With an accuracy of 98%, the model was assessed using measures including accuracy, precision, recall, F1-score, and error metrics (MAE- Mean Absolute Error, MSE- Mean Squared Error, and RMSE- Root Mean Square Error). According to correlation studies, factors including assets, income, and CIBIL scores have a significant impact on loan approvals. Outperforming conventional methods, the model showed balance and generalization across both classes. The usefulness of these models for automated, data-driven credit determinations is emphasized in the paper's conclusion.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

In the latest phase of the banking industry's technology transformation, disruptive new financial service providers from outside the established banking system have entered the market1. BigTech (large tech companies that primarily focus on lending directly or with financial institutions) and FinTech (financial technology, including models like P2P lending and online credit alternatives to traditional banks) companies are making substantial inroads into the finance sector, posing a challenge to traditional banking despite banks' efforts to adapt to the digital landscape2. This rapid evolution signals a shift in th....

Access restricted. Please log in or start a trial to view this content.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Data collection

This study utilized the Loan Approval Prediction Dataset available on Kaggle. The dataset was extracted in February 2025 and consists of 4269 records aimed at evaluating loan data and forecasting loan approval outcomes. It includes 12 columns comprising detailed information on applicants' demographic profiles, such as employment status, dependents, self-employed, loan amount, loan term, CIBIL scores, financial background, and loan-specific attributes. The dataset was imported using the Pandas library and visually inspected using df.head () to understand its structure and quality.

....

Access restricted. Please log in or start a trial to view this content.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Feature correlation analysis

The feature correlation heatmap (Figure 2) gave useful information about the interrelationships between various attributes. Strong positive correlations were found between income, annual loan amount, and asset-related variables such as luxury assets value and bank asset value, demonstrating that an applicant's financial profile is im.......

Access restricted. Please log in or start a trial to view this content.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The stacking ensemble model for loan approval prediction performs exceptionally well across various evaluation metrics, demonstrating great accuracy and reliability. The correlations heatmap revealed that financial indicators such as annual income, loan amount, and asset values are strongly interrelated, emphasizing their importance in loan evolution, whereas the CIBIL scores have a strong negative correlation with loan status, strengthening their role in creditworthiness assessment. The model's confusion matrix had a lo.......

Access restricted. Please log in or start a trial to view this content.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The author declares no conflict of interest related to this research.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This research was supported by VIT-AP University, Amaravati, India.

....

Access restricted. Please log in or start a trial to view this content.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
Kagglehttps://www.kaggle.com/
Pandashttps://pandas.pydata.org/
Model libraryIBMhttps://www.ibm.com

References

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,
  1. European Systemic Risk Board. Reports of the Advisory Scientific Committee. , Elsevier. (2012).
  2. Vives, X. The impact of FinTech on banking. Eur Econ. 2, 97-105 (2017).
  3. Jacobides, M. G., Drexler, M., Rico, J. Rethinking the future of financial services: A structural and evolutionary perspective on regulation.

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Stacking EnsembleLoan Approval PredictionMachine Learning TechniquesDigital LendingPeer To Peer LendingAlgorithmic Risk AssessmentGradient BoostingXGBoost ModelCredit ScoringFinancial Inclusion

Related Articles