RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
A subscription to JoVE is required to view this content. Sign in or start your free trial.
Research Article
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
This article presents an automated feature extraction pipeline that incorporates Nyquist-shift enhancement and machine-learning ensemble models to distinguish between Alzheimer's disease, frontotemporal dementia, and Healthy classes without requiring human intervention.
Alzheimer's disease (AD) and fronto-temporal dementia (FTD) are common neurodegenerative disorders that impair memory, cognitive function, and executive processing. The purpose of this study is to develop a fully automated machine learning pipeline for predicting Alzheimer's disease at an early stage without requiring any clinical intervention. This methodology proposes a quantitative analysis of subtle neuro-activity shifts. The goal is to develop a reliable, fully automated system that utilizes EEG data to classify patients into AD, FTD, and Healthy Control groups, eliminating the need for human intervention or clinical assessments. A major innovation of the system lies in its signal processing approach and automated feature pipeline. Specifically, the strategic modification of the Nyquist frequency is used to enhance EEG signal resolution, in combination with a hybrid fusion layer that integrates multi-domain EEG features and demographic data. Subsequently, a two-way ANOVA-based feature selection refines this hybrid set. This enhancement facilitates more effective feature extraction, contributing to higher classification accuracy. In the proposed method, frequencies are epoched to enrich the training dataset. And thereby the standard random forest model gives 99.72% training accuracy. To ensure the robustness and generalizability of the method, a hybrid fusion model is proposed.
The common symptoms of Alzheimer's disease are memory loss, cognitive impairment, sleep insomnia, confusion, disorientation, personality changes, etc. Therefore, early prediction can ensure patients do not degrade to the worst levels, where they need help 24/7. There are various ways to diagnose this disease, such as MRI1, CT, PET2, and EEG3. In particular, the EEG signals in combination with MMSE have proven to be a non-invasive method for the prediction of Alzheimer's disease. The EEG signal frequencies are varied as alpha, beta, theta, gamma, and delta. An alpha wave is detected when a person is in a relaxed state3. The frequency of the alpha waves ranges from 8 to 12 Hz4. Beta waves are active during the alert stage of the brain and range from 13 to 30 Hz4. The theta and delta waves are active in sleeping and deep sleep states, respectively. Their frequency ranges are theta (4-7 Hz) and delta (< 3.5 Hz) [4]. Traditional diagnostic methods rely on imaging and clinical evaluations, such as MMSE, which are often subjective and costly, whereas EEG signals capture brain activity in real time and offer efficient alternatives.
This research proposes a fully automated diagnostic pipeline that takes only EEG signals as input, thereby circumventing the need for clinical scoring. Various methods can be used to extract these waves from the brain. A bandpass filter that considers the Nyquist frequency is a method for extracting the EEG frequencies. The Nyquist frequency reduces the sampling frequency to half to obtain better clarity in the pre-processed signals. Therefore, the EEG test in combination with the Mini-Mental State Examination (MMSE) is a method used by neurologists to evaluate various stages of Alzheimer's disease (AD). The European Federation of Neurological Scientists5 has established guidelines for diagnosing various classes of this disease.
This article broadly classifies Alzheimer's disease (AD), fronto-temporal dementia (FTD), and healthy subjects (control category). Fronto-temporal dementia is a condition of the human brain in which only the front part and the temporal lobe of the brain are affected6,7. Figure 1 illustrates the regions of the brain typically affected in Fronto-temporal Dementia (FTD), including the frontal and temporal lobes. In this study, time-domain and frequency domain features from the EEG signal are combined with demographic features. To ensure the objectivity and autonomy of the machine learning framework, the Mini-Mental State Examination (MMSE) score was deliberately excluded from the feature set. Although MMSE is a widely used clinical metric for cognitive assessment, it is inherently influenced by the clinician's evaluation and introduces a dominant bias in classification tasks. By omitting MMSE, the proposed method aims to develop a self-reliant, data-driven diagnostic model capable of functioning independently of subjective clinical inputs, thereby supporting the future development of fully automated, AI-based screening tools. The final features, excluding the MMSE score, are labelled and provided to the proposed EEG-hyfusion model. The details are described in the Protocol section.
Literature survey
Electroencephalography (EEG) offers a non-invasive, cost-effective tool for early diagnosis of neurological conditions. EEG feature extraction and Frequency Band Analysis have emerged as a crucial modality in neuroscience and clinical research, offering insights into brain activity across a range of cognitive and pathological states. Leveraging machine learning to analyze EEG data has gained traction in recent years, with applications spanning from cognitive state monitoring to early diagnosis of neurological disorders such as Alzheimer's disease. This literature survey highlights key contributions and gaps in existing research related to EEG feature extraction, classification, and model development.
EEG signal cognitive relevance
Although Subha et al.8 emphasized time-frequency domain feature, RMS, which efficiently captures the non-stationary nature of EEG signals, their work does not introduce any model building. Therefore, this study proposed ML models to determine how accurately the disease can be diagnosed.
Barry et al.9demonstrated the relationship between EEG frequency bands and cognitive states, linking specific frequency bands (e.g., alpha waves) to relaxation and attention. Buzsáki10 explored the physiological significance of neural oscillations and their role in brain functions, underscoring the utility of frequency-specific analysis in EEG studies. While frequency band analysis is well-established, the need for efficient and scalable feature extraction pipelines remains an area of focus, particularly in high-dimensional datasets.
EEG feature extraction techniques
Senkaya, Kurnaz et al.11 have used spectral features, such as PSD and entropy, to predict Alzheimer's disease. These spectral powers help to capture the slowing down of brain activity more accurately. Although11 the work addressed spectral features effectively, it did not incorporate time-domain features like RMS. Therefore, this study proposes time-domain features extracted from frequencies like alpha, beta, gamma, theta, and delta.
Luck et al. and Kappenman et al.12 reviewed EEG biomarkers of attention and working memory, emphasizing the significance of frequency-band features in cognitive state discrimination. Cassani et al.13 explored EEG as a diagnostic tool for Alzheimer's disease, leveraging frequency-band features to identify neural signatures of cognitive decline. Michel and Murray14 discussed the utility of multichannel EEG analysis for understanding cognitive processes and neural connectivity, laying the groundwork for machine learning applications. Despite these advances, the integration of automated feature extraction pipelines with scalable machine learning models, such as Random Forest and XG-Boost, remains underexplored. The existing literature highlights combination frequency-band analysis with machine learning techniques for cognitive and clinical applications.
Machine learning and hybrid fusion models for dementia classification
Machine learning models, particularly ensemble-based methods such as XGBoost, have shown promise in classifying EEG-derived features15. These models address the complexity and variability of EEG signals by learning non-linear relationships between input features and output labels. Although the study15 has proposed a suitable model, the accuracy score is compromised.
Craik et al.16 provided a comprehensive review of deep learning and traditional machine learning models for EEG analysis, highlighting the success of gradient-boosting frameworks in structured data. Abiri et al.17 applied XG-Boost to EEG data for brain-computer interface applications, achieving robust classification performance due to its ability to handle feature redundancy and noise. Aghababaei et al.18 demonstrated the usage of ensemble methods, including XG-Boost, in EEG-based emotion recognition tasks, reporting improvements in accuracy and generalization over traditional approaches. However, challenges such as imbalanced datasets, underrepresentation of certain classes, and the interpretability of machine learning models persist, necessitating further research into model optimization and regularization techniques.
Zheng et al.19 used resting-state, eyes-closed EEG (19 channels) to compute time-frequency functional connectivity measures and trained machine-learning classifiers to separate AD, FTD, and healthy controls. However, they underscored the need for methods that enhance signal representation and provide robust ensemble classification.
EEG-based classification has been extensively applied in cognitive neuroscience and clinical domains. While previous studies provide a strong foundation for machine-learning-based EEG analysis, several gaps remain. Many studies rely on handcrafted features (e.g., RMS, PSD), which may not capture all the relevant information. Automated feature engineering and deep learning methods could complement traditional approaches.
Imbalance in EEG datasets, particularly in multi-class settings, often skews classification results. Strategies such as class-specific weighting and data augmentation could improve performance for underrepresented classes. However, there remains a need for efficient, interpretable, and generalizable pipelines that integrate feature extraction and model classification.
The proposed method segments EEG signals strategically into the frequency domain to improve spectral resolution. Subsequently, the captured EEG signals are epoched into 10 segments to enhance temporal resolution and consistency in feature computation. The proposed method implements an RF, XGBoost, and SVM-based classifier on derived EEG features, thereby contributing to the development of robust, scalable methods for EEG analysis. The RF model with certain hyperparameter changes has proved to be the best-performing among other standard models. Therefore, a hybrid, fused, stacked metalearner model is proposed that combines RF and XG-Boost to further improve prediction accuracy.
The dataset used in this method was collected in accordance with institutional ethical standards. The dataset providers obtained consent from participants prior to data acquisition.
1. Dataset specification
To prove the robustness of the system, two sets of data have been used. The first set of data is collected from the EEG data repository20, comprising 88 participants at the resting state eyes closed condition, where 36 were diagnosed as AD,23 of them FTD, and 29 were healthy. EEG recordings were acquired in a resting-state, eyes-closed condition using a 19-channel system following the international 10-20 montage. In the dataset shown in Figure 2, all signals were sampled at 500 Hz and provided in BIDS format with preliminary pre-processing by the dataset contributors.
The second set of data is collected from an external public repository21 comprising 35 participants at resting state. Out of which 13 participants are diagnosed with Alzheimer's disease (AD), 7 participants are mild cognitive impairment (MCI) patients, and 15 are healthy elderly. Only the resting-state baseline segments of the second dataset were used to maintain consistency with the first dataset20. The second set of data21 required a complete pre-processing pipeline, as mentioned in Figure 3.
2. Pre-processing
Existing studies rely heavily on complex pre-processing pipelines, manual artifact-correction steps, or ICA-ASR to remove muscle movements, which limit reproducibility in routine clinical workflows. To address these limitations, the proposed pipeline focuses on a streamlined EEG-only approach that eliminates the need for computationally intensive artifact-removal procedures and instead emphasizes controlled filtering, epoch segmentation, and frequency-specific feature computation.
3. Feature extraction
The pre-processed EEG signals are passed to feature-extraction methods. EEG signals are bandpass-filtered into 5 standard frequency bands: Delta (0.5-4 Hz), Theta (4-8 Hz), Alpha (8-13 Hz), Beta (13-25 Hz), and Gamma (25-40 Hz).
For each filtered signal, the Root Mean Square (RMS) value was computed across all EEG channels. The mathematical formulation of RMS is given in Equation (1):
RMS =
(1)
Here, xi is the EEG signal amplitude at the i th time sample. N is the total number of samples in the signal segment.
RMS was selected as the primary feature due to its ability to quantify the energy of oscillatory activity in each frequency band. Alzheimer's disease is often associated with increased delta and theta activity and reduced alpha and beta activity. Fronto-temporal dementia (FTD) may exhibit distinct patterns across these bands20,21. As PSD is mathematically redundant to RMS, it is not considered a feature. To maintain a compact dataset, the remaining features, such as Hjorth and Entropy, are excluded.
4. Hybrid fusion
Extracted time-domain RMS features were combined with frequency domain features received from bandpass filtering by making it a hybrid space for model input. The detail of compilation is shown in Figure 3. The final normalized dataset is organized into a tabular format with rows representing participants and columns representing time-frequency band features and demographic features such as age, gender, and group. The pipeline shown in Figure 2 efficiently extracts EEG features relevant for Alzheimer's research. The extracted features provide insights into neural activity within key frequency bands and can be utilized for machine learning and statistical analysis.
5. Feature Selection
To improve model's performance and reduce feature dimensionality, a 2-way Analysis of Variance (ANOVA)-based approach was applied. The ANOVA-based selection was applied with Group and age as independent factors, and RMS frequency bands as dependent features. This analysis evaluated the effects of Group, Age, and Age-Group interaction on each feature. Features with p-values < 0.05 for at least one factor were selected for further classification. The detailed ANOVA-based results are shown in the results section. These features are the most informative variables for classification, which helped improve the accuracy and generalization of the final predictive model. As a result of the ANOVA test, only the most informative features were selected for model creation. And the additional non-numeric attributes, Gender and participant_id, were excluded from the analysis. The target variable Group was label-encoded for classification.
6. Model description
To classify the disease into 3 classes, namely Alzheimer's, Control, and Frontal dementia, three supervised machine learning models, such as XG-Boost, Random Forest, Support Vector Machine SVM, and a Stacked model, were built. All of the mentioned models used 70% of the training dataset and 30% of the test dataset. Each model was selected for its proven performance in healthcare data analysis and its ability to handle non-linear and high-dimensional feature spaces. Evaluation of the model was conducted on the validation dataset to assess the generalizability of the model.
The model performance was evaluated using a confusion matrix, comprising true positive (TP), false positive (FP), true negative (TN), and false negative (FN). Performance metrics included accuracy and a classification report detailing precision, recall, and F1-scores for each class. The following Equation illustrates performance metrics.
(2)
(3)
(4)
(5)
1. XG-Boost classifier
This method implemented a multi-class classification model using the XGBoost algorithm to predict EEG-derived cognitive states from extracted features, as shown in output 1 of Figure 3. Output 1 contains RMS features and the MMSE score. While building the XGBoost model, the MMSE score is included to test the non-linear behaviour of the RMS features. It is used as a benchmark and baseline for the system development. MMSE was excluded in later stages to meet the goal of a fully automated, clinician-independent system.
XG-Boost is a gradient boosting framework optimized for efficiency and accuracy, making it well-suited for handling structured data in classification tasks. The few hyperparameters are tuned on the EEG dataset using techniques like cross-validation and grid search. The key parameter values defined below are chosen to control overfitting via regularization and to capture the complexity of the EEG features without being too complex.
The Key Parameters of the Model are: (1) Maximum Tree Depth = 8, which limits the depth of decision trees to reduce overfitting and enhance generalization. Because a shallow tree would miss unnecessary information, as a deeper tree may overfit. (2) L2 Regularization (λ = 10) adds a penalty for large coefficients to minimize overfitting by controlling model complexity. (2) L1 Regularization (α = 5) is introduced to give additional sparsity in the model to improve interpretability and robustness. The number of Estimators (n_estimators=8) limited the number of boosting iterations to maintain computational efficiency while balancing performance.
2. Random Forest classifier
Random Forest is an ensemble classifier that builds multiple decision trees on random subsets of data and features, aggregating their outputs for the final prediction. It offers robustness to noise, handles non-linear data well, and reduces over-fitting by averaging diverse tree predictions. In this work, the Random Forest model was configured with key parameters by using the trial-and-error method. Such parameters are explained below:
1) n_estimators=100
It is chosen to limit the number of decision trees in the forest. The more trees, the better the generalization and the more stable the predictions.
2) max_depth=10
It indicates the maximum depth of each tree. If the tree has low depth, there is a chance of underfitting. Whereas high depth may capture detail and memorize the training data, it may overfit. Therefore, choosing 10 is a sweet spot, especially for noisy or complex signals like EEG.
3) random_state=40
It fixes the random number generation seed used for Bootstrap sampling and Tree construction. This ensures reproducibility
4) n_jobs=-1
By setting the value of n_jobs to -1, the model is making use of all the CPU cores efficiently.
7. Support Vector Machine (SVM)
Support Vector Machine is a margin-based classifier known for its linear performance in binary and multi-class classification22. This model used a radial basis function kernel because of its ability to capture non-linear EEG signals. However, the model exhibited suboptimal performance, likely due to the lack of feature scaling and the limited dataset size. While SVM is theoretically powerful, its sensitivity to hyperparameter tuning and data distribution may explain the lower accuracy observed compared to tree-based models.
8. Proposed HY-fusion model
Though the Random Forest model achieved high accuracy, to ensure generalizability and scalability, a stacked model was built with Random Forest and XG-Boost. The output is fed as input to a logistic regression model. And logistic regression acts as a metalearner. The architecture diagram of the model is given in Figure 4.
Along with 1st set of input20, the stacked model also took input from the second set of data23. The dataset from Mendely21 is pre-processed to a model-understandable form by extracting RMS and epoching to 10 s. The results obtained by each model are shown in the results section.
To justify the strength of the epoch method, the RF model's performance is compared without epoching the dataset. In the non-epoched condition, features were extracted from the entire continuous recording and given to the RF model. The non-epoching signal yielded 53% accuracy. This demonstrates that epoching provides a more stable informative method for model creation. Therefore, ANOVA-based features with p < 0.05 are considered to be significant. For instance, alpha_rms exhibited a significant Group effect (p = 9.007147 × 10⁻³⁷), while Age-Group and interaction effects were not statistically significant, suggesting that alpha_rms variations are primarily due to disease state rather than age. Similarly, beta_rms demonstrated significance for both Group (p = 1.773449 × 10⁻⁸) and Age-Group (p = 4.457329 × 10⁻²), indicating its sensitivity to both pathological and age-related changes. Gamma_rms is significant for Group (p = 4.212443 × 10⁻¹°) and the interaction term (p = 4.527135 × 10⁻⁴), suggesting that the effect of disease on gamma activity depends partially on age group. The features theta_rms and delta_rms did not show a significant Group main effect; however, they achieved p < 0.05 for either the Age-Group effect or the Group × Age-Group interaction. Therefore, all the features are considered to be significant for model development (Table 1).
The XGBoost model described in step 3 achieves 87% accuracy on the training dataset and 72% on the test dataset. It shows the strength of EEG features in assisting a clinical independent system. The proposed model used RF and XGBoost to achieve higher accuracy. As the RF model is found to be best suited for these non-linear EEG features, the accuracy and other performance metrics, such as F1-score, precision, and recall, of the RF model are shown in Table 2.
From Table 3, it is clear that the SVM model showed poor performance in classifying diseases. Therefore, the SVM model is deliberately excluded from further research. The RF model has been shown to be best suited to these non-linear EEG features, and it is used as the base model in the proposed hy-fusion stacked model. This classifier demonstrates strong performance in distinguishing between cognitive states, indicating superior learning from handcrafted EEG features and robustness to class imbalance and feature noise.
Further, the validation dataset24 is used in a stacked model to ensure generalizability and robustness of the model. The results obtained are shown in Table 4. The stacked model proved to be robust on the unseen dataset. Importantly, when evaluated on the independent validation Dataset-2, the stacked model maintained strong performance, with an overall accuracy of 88.09% and balanced per-class F1-scores. This behaviour indicates that the stacked model, while not the top scorer on the Dataset-1 test split, exhibits favourable generalizability to unseen data, an essential property for clinical screening systems. While the stacked model's Dataset-1 test accuracy is marginally lower, it provides better discrimination of FTD and HC cases on external data, which is crucial for clinical screening. To validate the model's performance, the confusion matrices are shown in Figure 5 and Figure 6. These figures prove the RF model's and the stacked model's accurate behaviour.
DATA AVAILABILITY:
The dataset related to this study is available in a public repository20.

Figure 1: Frontal and temporal lobes of the brain involved in frontotemporal dementia (FTD). Please click here to view a larger version of this figure.

Figure 2: Pre-processing pipeline. Please click here to view a larger version of this figure.

Figure 3: Three-tier feature extraction method. Please click here to view a larger version of this figure.

Figure 4: Model architecture diagram. Please click here to view a larger version of this figure.

Figure 5: Confusion matrix of the random forest model on Dataset 1. Please click here to view a larger version of this figure.

Figure 6: Confusion matrix of the stacked model on Dataset 1. Please click here to view a larger version of this figure.
| Feature | Factors | p-value |
| Delta_rms | Group | 1.04×10-3 |
| Age-Group | 4.72×10-9 | |
| Group × AgeGroup | 9.55×10-4 | |
| Theta_rms | Group | 1.15×10-18 |
| Age-Group | 3.71×10-8 | |
| Group × AgeGroup | 5.35×10-5 | |
| Beta_rms | Group | 1.773449 × 10-8 |
| Age-Group | 4.457329 × 10-2 | |
| Group × AgeGroup | 1.161183 × 10-1 | |
| Gamma_rms | Group | 4.212443 × 10-10 |
| Age-Group | 4.199649 × 10-1 | |
| Group × AgeGroup | 4.527135 × 10-4 | |
| Alpha_rms | Group | 9.007147 × 10-37 |
| Age-Group | 4.374821 × 10-2 | |
| Group × AgeGroup | 1.089394 × 10-1 |
Table 1: Analysis of variance test significance.
| Model | Metric | AD | HC | FTD |
| RF | Precision | 0.92 | 0.91 | 0.93 |
| Recall | 0.94 | 0.91 | 0.89 | |
| F1 Score | 0.93 | 0.91 | 0.91 | |
| SVM | Precision | 0.42 | 0.4 | 0.4 |
| Recall | 0.39 | 0.41 | 0.4 | |
| F1 Score | 0.4 | 0.42 | 0.4 | |
| XG Boost | Precision | 0.9 | 0.92 | 0.9 |
| Recall | 0.91 | 0.89 | 0.91 | |
| F1 Score | 0.9 | 0.91 | 0.9 | |
| Proposed Stacked Model | Precision | 0.93 | 0.92 | 0.93 |
| Recall | 0.92 | 0.91 | 0.91 | |
| F1 Score | 0.93 | 0.93 | 0.92 |
Table 2: Performance metrics of the models.
| Model | Train Accuracy | Test ACCUracy |
| SVM | 40.91% | 40.91% |
| RF | 99.72% | 92.05% |
| Stacked Model | 99.83% | 91.86% |
Table 3: Accuracy measures on the OpenNeuro dataset.
| ta | Metric | Stacked Model |
| Alzheimer’s Disease (AD) | Precision | 0.87 |
| Recall | 0.86 | |
| F1-Score | 0.89 | |
| Healthy Control (HC) | Precision | 0.82 |
| Recall | 0.87 | |
| F1-Score | 0.84 | |
| Fronto-Temporal Dementia (FTD) | Precision | 0.89 |
| Recall | 0.79 | |
| F1-Score | 0.83 | |
| Overall Accuracy | 88.09% |
Table 4: Validation dataset performance.
| Authors | Dataset | Feature Type | Classifier | Accuracy |
| Proposed method | Dataset1 and Dataset2 | Frequency-domain RMS features | RF | 99.72% |
| A1 Quazzet ai [25] | Clinical EEG dataset | PSD | Squeeze Net, CNN | 91.20% |
| Khosravi et ai [26] | Clinical EEG | PSD | LSTM | 95.23% |
| Kongwudhikunakorn et ai[27] | EEG | PSD | CNN | 89% |
Table 5: Comparative study of the proposed approach and existing EEG-based dementia classification methods.
In this proposed study, the motivation for using the Hy-Fusion stacked architecture is to combine the complementary strengths of tree-based learners, such as RF, and to reduce prediction variance through a logistic-regression meta-learner. The stacking strategy is commonly adopted when base learners have different inductive biases and error patterns. The meta-learner learns to correct systematic errors of base classifiers and to produce more calibrated ensemble outputs.
The process of creating epochs not only enriched the dataset but also helped push the model's performance to a higher level. Table 5 shows a comparative study of the proposed approach with existing EEG-based dementia classification methods25,26,27.
The present work addresses a more challenging three-class problem: AD, FD, and healthy. Also, it incorporates Nyquist shift enhancement, which is not used in earlier methods. This highlights the novelty and effectiveness of the proposed method.
Conclusion
This study proposed machine learning-based models to classify EEG data into three neuro-cognitive conditions, such as Alzheimer's disease (AD), Fronto-temporal Dementia (FTD), and Healthy Control (HC). This method performed feature selection based on ANOVA test results of EEG signals to derive meaningful indicators of brain activity, which were then used to train and evaluate multiple classifiers, including XG-Boost, Random Forest, SVM, and the proposed stacked model, to ensure the method's generalizability. And finally, the models are validated on unseen datasets from other public repositories23.
The model evaluation shows that the Random Forest classifier and stacked XG-Boost, RF with Logistic regression as meta-learner, significantly outperformed the others. It also demonstrated strong performance in identifying all three classes, including challenging cases like FTD. These findings highlight the potential of machine learning methods in EEG-based dementia screening and diagnosis. Though the results are promising, there is room for improvement by integrating IoT-based sensors into the models.
Future work
Though the results are promising, there is room for improvement by integrating IoT-based sensors into the models. IoT sensors (e.g., wearable ECG, pulse oximetry, or activity trackers) may be incorporated in future versions of the system to enable continuous monitoring, but they are not part of the current EEG-based method.
The authors confirm that there is no conflict of interest to declare for this publication.
The authors would like to thank the editor and the anonymous reviewers for their comments, which helped improve the quality. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
| Bandpass filtering tools | MNE-Python | Built-in filters | Used for preprocessing |
| Computer workstation | — | Windows/Linux | Analysis computation |
| EEG Acquisition System | Provided by dataset creators | — | Not used by authors (dataset provided) |
| Epoching functions | MNE-Python | Built-in functions | Used for segmentation |
| GitHub or Google Drive | — | — | Storage for code/data |
| Google Colab | Online | Cloud computing environment | |
| MATLAB (used for signal checks) | MathWorks | R202x | Optional |
| Mendeley EEG Dataset | Mendeley Data | External validation dataset | |
| MNE-Python | https://mne.tools | v1.x | EEG preprocessing/analysis |
| NumPy | NumPy developers | Latest | Array computation |
| OpenNeuro EEG Dataset | OpenNeuro | ds004504 (v1.0.8) | Primary dataset for AD/FTD/HC EEG |
| Python | Python Foundation | v3.10+ | Programming environment |
| scikit-learn | sklearn developers | v1.x | Machine learning models |
| SciPy | SciPy developers | Latest | Signal processing |