Method Article

Early Diagnosis of Hypothyroidism in the Eastern Province of Saudi Arabia Using Computational Intelligence Techniques

DOI:

10.3791/70065

May 22nd, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study investigates four machine learning algorithms: K-nearest neighbors (KNN), support vector machine (SVM), extreme gradient boosting (XGBoost), and a soft voting ensemble classifier in the proactive diagnosis of hypothyroidism. The algorithms were shortlisted based on a critical literature review and employed over a locally collected dataset from Saudi Arabia.  

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Hypothyroidism is one of the most common yet underdiagnosed medical conditions in Saudi Arabia. It is more prevalent in aged and pregnant women as well as in patients with diabetes and sleep apnea. Hypothyroidism is characterized by the thyroid gland producing inadequate thyroid hormones, which might result in other chronic illnesses if left untreated. For this reason, this study proposes machine learning techniques to preemptively diagnose this disease using a straightforward clinical dataset from Saudi Arabia. Given the data size, this work serves as proof of concept. Algorithms such as KNN, SVM, Gradient boosting, and soft voting ensemble classifier were chosen for their promising performance in the proactive diagnosis of hypothyroidism and associated diseases compared to other algorithms in literature. The best performing model was the soft voting ensemble classifier, which achieved an accuracy of 94.7%. SVM, KNN, and XGBoost achieved 94%, 93.42%, and 92.1% accuracies, respectively. These results were obtained using 10 fold cross validation and forward sequential feature selection.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Thyroid dysfunction (TD) is one of the most common chronic endocrine illnesses, with varying prevalence across populations1. The two most common thyroid gland disorders are hypothyroidism and hyperthyroidism. Hypothyroidism is a prevalent condition that results from insufficient thyroid hormone production. While it is generally manageable with treatment, in severe instances, it can be fatal if not addressed. The most common symptoms of hypothyroidism in adults include fatigue, constipation, lethargy, weight gain, dry skin, and voice changes. However, clinical presentations may vary by age, gender, and other factors3. It ....

Access restricted. Please log in or start a trial to view this content.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

1. Methodology

NOTE: As described in Table of Materials, the experiment for this research was conducted in Python on Jupyter Notebook, which was used to develop machine learning models for preemptive diagnosis of Hypothyroidism. Microsoft Excel and the Python Scikit-Learn library were used in the preprocessing stage. K-nearest neighbor (KNN), support vector machine (SVM), extreme gradient boosting (XGBoost), and a soft voting ensemble classifier composed of KNN and SVM were then used to train the dataset, with GridSearchCV and 10-fold cross-validation to obtain optimal hyperparameters. Additionally, the featu....

Access restricted. Please log in or start a trial to view this content.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Performance of models
The performance of the four models created in this study was compared and analyzed regarding their accuracy, recall, and testing. GridsearchCV algorithm was utilized to identify the best hyperparameter for each model using all features. The performance of each model was evaluated with a stratified 10 fold cross validation. Table 5 presents the results of each model’s performance.

Access restricted. Please log in or start a trial to view this content.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

In this study, machine learning algorithms on a Saudi clinical dataset are investigated to develop a preemptive diagnostic model for hypothyroidism. Four models were created: KNN, SVM, XGBoost, and Soft Voting Ensemble Classifier. All four models identified age and respiratory rate as significant indicators in the preemptive diagnosis of the disease. Among the four models, the Voting Classifier demonstrated superior performance across all metrics used in this study. Utilizing just five features, it achieved an accuracy o.......

Access restricted. Please log in or start a trial to view this content.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The dataset has been obtained from the hospital under IRB-2020-09-429. The authors have no conflicts of interest to report regarding the current study.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors would like to acknowledge the support of healthcare professionals for validation of the findings of the study.

....

Access restricted. Please log in or start a trial to view this content.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
Laptop/MachineDellXPS9320RAM 16GB, 12th Gen Intel(R) Core(TM) i7-1260P
Excel 365MicrosoftExcel 365Used to store raw data in csv format
Python 3.12.12Google colab Notebook Python 3.12.12Used for model building training 
Numpy 2.0.2Google colab Notebook Numpy 2.0.2Used for model building training 
Pandas 2.2.2Google colab Notebook Pandas 2.2.2Used for model building training 
Sklearn 1.6.1Google colab Notebook Sklearn 1.6.1Used for model building training 
Mlxtend 0.23.4Google colab Notebook Mlxtend 0.23.4Used for model building training 
XGbbost 3.1.2Google colab Notebook XGbbost 3.1.2Used for model building training 
SPSS IBM, USASklearn 1.6.1Used for statisitcal analysis

References

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,
  1. Biondi, B., Kahaly, G. J., Robertson, R. P. Thyroid dysfunction and diabetes mellitus: two closely associated disorders. Endocr Rev. 40 (3), 789-824 (2018).
  2. Khan, A., Khan, M. M. A., Akhtar, S. Thyroid disorders, etiology and prevalence. J. Med. Sci. 2 (2), 89-94 (2002....

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Hypothyroidism DiagnosisComputational IntelligenceMachine LearningSaudi ArabiaSoft Voting ClassifierSVM AlgorithmKNN AlgorithmGradient BoostingClinical DatasetFeature Selection

Related Articles