Research Article

An Explainable Privacy Preserving Multimodal Ensemble Framework For Skin Lesion Classification

June 12th, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The proposed work aims to develop and evaluate an explainable, privacy-preserving multimodal ensemble fabric arrangement for accurate skin lesion classification by integrating deep learning features, clinical metadata, and explainable AI techniques to improve diagnostic accuracy, transparency, and reliable clinical decision support for early skin cancer detection.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Among dermatological diseases, skin cancer is among the most life-threatening. Early and accurate diagnosis is important for improving a patient's prognosis. Nevertheless, traditional AI-based diagnostic methods face several challenges, including privacy concerns, limited interpretability, and a severe class imbalance in multi-class skin lesion datasets. To overcome these challenges, the proposed paper proposes a privacy-aware, explainable multimodal skin lesion classification model that combines complex deep learning models and an ensemble modeling approach with explainable artificial intelligence methods. Experimental evaluation is conducted using publicly available HAM10000 benchmark data on multi-class skin lesion classification that can be accessed by means of Kaggle Hub, distributed over seven clinically significant lesion classes (akiec, bcc, bkl, df, mel, nv, vasc). To balance the data, a class-balancing technique is used to boost the minority classes. The EfficientNet B4, DenseNet201, and MobileNetv2 are used to extract deep feature representations, afterward combined with salient clinical metadata to create a robust multimodal feature space. These multimodal features are used to train XGBoost, LightGBM, Deep Neural Classifier (DNC) that resulted classification accuracies of 92%, 90% with 94% respectively. A stacked ensemble strategy is applied to combine the outputs of XGBoost, LightGBM, and Deep Neural Classifier (DNC), which leads to an improvement in accuracy of 96%. Model interpretability techniques provide feature-level explanations that increase transparency. The experimental findings proved the practicality of the suggested framework in terms of efficiency with clinically relevant real-life classification of skin lesions.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Skin cancer represents a significant global health burden, with increasing incidence rates reported worldwide1. Artificial radiation is recognized as a major contributing factor to skin cancer, leading to genetic mutations that result in uncontrolled cell proliferation and tumor development in skin cells1,2. Skin cancers comprise a group of diseases, including melanoma, squamous cell carcinoma, and basal cell carcinoma (bcc). The causes, clinical presentation, and prognostic factors of these conditions all differ3. Skin diseases have become an obstacle in medical diagnosis due to pixel-level similarities4. In 2022, there were 331,722 estimated melanoma cases (58,667 deaths) and 1.2 million NMSC cases (69,416 deaths) globally. The peak death rate age-standardized incidence rates (ASR) for melanoma were in Oceania (29.78/100,000), North America (16.3), and Europe (10.43). However, the mortality-to-incidence ratio was highest in Africa (0.35) and Asia (0.30) compared to North America and Oceania (0.02 in both), which may reflect a poorer prognosis1. In dermatology, the diagnosis and monitoring of skin lesions have primarily relied on visual examination and other non-invasive assessments. Invasive methods are not applied because they can damage the lesions and prevent the performance of a clinical follow-up of the lesion growth5. Skin lesions can be of different types: melanoma (MEL), dermatofibroma (DF), actinic keratosis and intraepithelial carcinoma (AKIEC), basal cell carcinoma (BCC), benign keratosis (BKL), melanocytic nevus (NV), and vascular lesions (VASC), as defined in the HAM10000 dataset5. Major challenges in the classification of dermatoscopic images are the presence of hairs, inks, ruler marks, colored patches, glimmers, drops, oil bubbles, blood vessels, hyperpigmented areas, and/or inflammatory lesions6.There have been studies previously on feature selection and deep learning for medical imaging and skin lesion classification7,8.

Computer vision-based approaches for skin cancer diagnosis and the integration of handcrafted and deep features have also been investigated9, along with feature fusion strategies for improved classification performance10. Recent advancements further emphasize the integration of machine learning in healthcare systems and secure medical data processing frameworks11,12. AI healthcare utilization powered by advanced computational algorithms has the potential to deliver personalized and efficient integrated care programs, especially beneficial for patients in remote and home care settings13. By utilizing extensive datasets of dermatoscopic images, deep learning models—particularly Convolutional neural networks (CNNs)—can be trained to accurately identify and classify various skin lesions. Several techniques show strong outcomes in skin lesion segmentation, including Fully convolutional networks (FCNs), CNNs, Deep CNNs (DCNNs), Fully convolutional residual networks (FCRNs), and U-Net architectures. Deep neural networks (DNNs) are not easily interpretable due to their highly complex architecture, so their decision-making process is hard to comprehend14,15. Recent advances in medical image analysis have demonstrated that deep convolutional neural networks (CNNs) significantly improve efficiency in skin lesion classification tasks. Several studies on dermoscopic datasets such as HAM10000 have shown that CNN-based architectures, including ResNet, DenseNet and EfficientNet, achieve strong multi-class classification performance by learning hierarchical feature representations from lesion images. Hybrid feature fusion approaches, where multiple CNN backbones are combined, have further improved diagnostic accuracy by integrating complementary deep representations16. Moreover, current studies have investigated hybrid CNN Transformer models in medical image analysis. Models with vision transformer and CNN feature extractors have been proven to have better outcomes in skin lesion classification tasks because they are better able to extract local texture content as well as global contextual relationships17. These hybrid designs are also being viewed as state-of-the-art in medical imaging because they have a balanced representation learning ability.

In other areas of medicine, feature fusion strategies have been extensively used outside dermatology. CNN-based hybrid systems have also been applied in the analysis of histopathological images to achieve better classification of lung and colon cancer with enhanced feature representations and spatial learning dynamics16. Equally, in ophthalmology, the use of deep learning models trained on fused feature representations has demonstrated successful application in diabetic retinopathy staging of fundus images, with better robustness and classification accuracy in a multi-class grading task18. Multimodal fusion methods in these fields all suggest that heterogeneous feature representations yield better generalization and classification, especially in imbalanced medical data19.

Although these improvements have been made, the current practices are usually limited to being multimodal, not integrated, inadequate to address the issue of class imbalance, and unhelpful in clinical decision-making. To overcome these issues, this paper presents an explainable skin lesion classification model that is privacy-conscious and integrates both model interpretability methods. Such explainability methods can be used to explain the model's predictions, showing which features are most important and highlighting significant areas of dermoscopic images, enhancing clarity and confidence in clinical procedures, thereby improving clinical transparency, building trust, and supporting the safe implementation of AI systems in clinical practice. There is a significant imbalance in the HAM10000 dataset, with a few classes having far fewer samples than others. To overcome this problem, the synthetic minority over-sampling technique (also known as class balancing) is used to generate synthetic samples for underrepresented classes. Class balancing techniques balance the dataset, enabling the model to learn better from minority lesion types, increasing sensitivity, and allowing more reliable prediction of clinically significant yet less frequent classes of skin cancer.  Deep features of EfficientNet-B4, DenseNet201, and MobileNetV2 are combined with the clinical metadata to form a more informative representation of every skin lesion. This dual feature helps us to extract the visual patterns of dermoscopic images and other patient information for a more in-depth analysis. The features are then trained on different classifiers, including XGBoost, LightGBM, and a Deep Neural Network, to enhance the ability and power of the skin lesion classification model. The ensemble of the models is used with a stacking ensemble technique to enhance the model. This is a composite model that leverages the strengths of multiple models to learn from and benefit from the predictions of all models in the ensemble while mitigating their limitations.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study used publicly available, fully anonymized dermoscopic datasets and involved no direct human participation; therefore, ethical committee approval was not required. The Table of Materials contains details of all the materials or tools used in this study. Table 1 includes details of the hardware and software environment, such as processor type, memory, operating system, and software frameworks. Table 2 includes details of the class-wise precision, recall, F1-score, and support for each skin lesion category.

Overall workflow of the proposed multimodal skin lesion classification framework

The general plan of this research is to create a precise and comprehensible scheme of multi-classification of skin lesions. The workflow starts with data collection and preprocessing of the HAM10000 dataset, then proceeds to feature extraction using deep learning architectures and the inclusion of clinical metadata. Afterward, several machine learning classifiers are trained and optimized, and their results are aggregated in an ensemble strategy. Lastly, the predictions of the model are interpreted using explainability techniques, and the effectiveness of the model is evaluated for use in real-world clinical decision support.

In order to improve the predictive accuracy of the proposed system, a multi-modal machine learning pipeline is used, which combines both image-based features and clinical metadata (as shown in Figure 1. The model can sum up the visual outputs of dermoscopic images with the information related to the patient to identify more detailed patterns related to various skin lesions. With such a combination, the system can make better predictions, which will ultimately. Improve the quality and usefulness of skin lesion classification. Three pre-trained convolutional Deep features are extracted with the help of neural networks (EfficientNet-B4, DenseNet201, and MobileNetV2): they are capable of capturing a variety of complementary patterns of dermoscopic images. These architectures learn high-level patterns in how skin lesions look, like changes in color and texture, and the way they are built. Then, a feature fusion module combines the deep features with the clinical features and demographic data to make a rich multi-modal feature. The merged data is then separated into training, validation, and testing data to ensure appropriate model testing. Next, a feature fusion module is used to merge the deep features with the clinical features and demographics to produce a rich multi-modal feature. This data is then split into training, test, and validation data to test the model. An ensemble strategy is used to further enhance prediction accuracy. This is done by averaging the results of several models and coming up with the final prediction using those averaged probabilities to enhance generalization and minimize the variance that would otherwise have been caused by individual models. Besides this, explainability methods, such as model interpretability techniques, are also integrated to further explain how the model makes its decisions. The method of model interpretability provides feature-level interpretations by quantifying the contribution of input variables, whereas the method of model interpretability identifies important areas within dermoscopic images at the pixel level that affect the prediction. Model interpretability techniques offer feature-level explanations by quantifying the contribution of each input variable, while model interpretability techniques highlight important regions at the pixel level within dermoscopic images that influence the prediction. Combined, these techniques make the models more interpretable and help clinicians to learn about the way the system makes the decisions. As a result, the proposed pipeline provides a system that is understandable and privacy-conscious, increasing transparency and trust and enabling more dependable skin cancer diagnosis in a real-world healthcare setting.

Dataset description with preparation

In this paper, the HAM10000 (Human against Machine with 10,000 training images) dataset is used as the primary dataset for multi-class skin lesion classification. The dataset contains over 10,000 dermoscopic figures collected from various medical sources. Clinical sources and populations, making it one of the most widely used benchmark datasets in dermatological image analysis. Each image in the dataset is accompanied by important clinical metadata, including image identifiers, diagnostic labels, patient age, sex, and the anatomical location of the lesion. The dataset covers seven diagnostic categories: actinic keratoses (akiec), basal cell carcinoma (bcc), benign keratosis (bkl), dermatofibroma (df), melanocytic nevi (nv), vascular lesions (vasc), and melanoma (mel).

Clinical metadata preprocessing

Auxiliary features added to the classification pipeline included clinical metadata, such as age, sex, and the lesion's location in the patient. There were missing or unknown values, which were treated through a deterministic preprocessing approach. In the case of the age variable (numerical), the median age calculated on the training set was used to impute the missing values. The reason median imputation was chosen is that it is resistant to outliers and skewed data, which are prevalent in clinical data. For sex and lesion location (categorical variables), missing or unspecified values were not excluded; they were assigned to a special category labeled 'unknown'. The method maintains all available samples, and the model is free to determine whether missingness itself is predictive. One-hot encoding was then applied to categorical variables to enable them to be compatible with machine learning models. All preprocessing, such as imputation, encoding, etc., was only done on the training set, and the same transformations were done to the validation and experiment sets to avoid data loss. There were no samples excluded just because of missing clinical metadata, and this ensured that the data was maximally utilized, and there was methodological consistency.

Skin lesion classification process diagram using deep learning; metadata fusion; XGBoost, LightGBM.
Figure 1: Multimodal system for skin lesion classification. The study approach combines dermoscopic image features with patient metadata to classify skin lesions using ensemble deep learning models. The framework includes preprocessing, feature extraction, multimodal fusion, and classification, allowing for enhanced diagnostic performance and interpretability. Please click here to view a larger version of this figure.

The workflow depicts the suggested classification pipeline, based on dermoscopic images and clinical metadata of the HAM10000 skin lesion dataset. EfficientNet-B4, DenseNet201, and MobileNetV2 are used to preprocess and extract deep features in images. The clinical metadata are coded, and feature fusion is used to combine the image features with the clinical metadata. In order to address the issue of class imbalance, the class-balancing technique is used in the fused multimodal feature space instead of the raw images or individual feature streams, where synthetic samples maintain the combination of both the visual and clinical features and do not produce unrealistic samples. The merged features are then trained on classifiers such as XGBoost, LightGBM, and a deep neural classifier.

Skin lesion examples, dermoscopy method, clinical dermatology diagnosis, image analysis.
Figure 2: Example dermoscopic images from seven different diagnostic groups from the HAM10000 dataset. Images show typical visual features used for automated classification. (A) Actinic keratoses (akiec), demonstrating rough surfaces with irregular pigmentation. (B) Basal cell carcinoma (bcc), with irregular shapes and blood vessels. (C) Benign keratosis-like lesions (bkl), showing keratotic features with light brown surfaces. (D) Dermatofibroma (df), with a central scar-like appearance and pigmentation. (E) Melanocytic nevi (nv), benign and relatively symmetric moles. (F) Vascular lesions (vasc), showing a reddish-purple appearance due to blood vessels. (G) Melanoma (mel), which presents as an irregularly shaped, asymmetric, and multi-pigmented lesion. Please click here to view a larger version of this figure.

These dermoscopic images reveal the visual heterogeneity of skin lesions, which have variations in pigmentation, texture, and morphology of the structure. These variations pose a great challenge to automated classification systems and stress the significance of deep learning-based systems. Feature extraction techniques that are sensitive to revealing subtle diagnostic patterns. Following the dataset description, Figure 2 illustrates the seven categories of skin lesions included in the HAM10000 dataset, which are commonly studied in dermatological diagnostic imaging research. These classes include Actinic Keratoses (akiec), Basal Cell Carcinoma (bcc), Benign Keratosis (bkl), Dermatofibroma (df), Melanocytic Nevi (nv), Vascular Lesions (vasc), and Melanoma (mel)21. All these types of lesions have unique visual features, as shown in Figure 3, which include variation in pigmentation patterns, surface texture, color distribution, and abnormalities along the lesion borders. The visual characteristics of all these lesions are different, and they are characterized by variation in patterns of pigmentation, surface texture, color distribution, and abnormalities on the borders of the lesions. These are important characteristics that dermatologists would have in mind when conducting the clinical examination, and therefore have to be well modeled by machine learning models in order to attain the right classification. Even though these are the differentiating characteristics, many of these lesions appear virtually identical, which makes it difficult to differentiate between them when looking at merely dermoscopic images. The distinction between certain types of lesions is typically extremely subtle but clinically pertinent, making it challenging to classify automatically. This is why it is urgent to create potent AI models capable of training to learn fine-grained visual images and subtle differences in lesions among lesion classes. These properties will not only be enhanced by the appropriate description, which will result in the improvement of the discriminative skills of the model with different types of lesions, but also help to diagnose some perilous conditions, such as melanoma, earlier. Lastly, it can enhance the diagnostic accuracy, inform clinicians in making decisions that result in improved patient outcomes, and help make better decisions.

Skin cancer class distribution bar chart; NV (Melanocytic Nevi) shows highest image count (6706).
Figure 3: Class-wise distribution of skin lesions in the HAM10000 dataset. The figure shows the distribution of the seven lesion categories considered in this study: Actinic Keratoses (akiec), Basal Cell Carcinoma (bcc), Benign Keratosis-like lesions (bkl), Dermatofibroma (df), Melanocytic Nevi (nv), Vascular Lesions (vasc), and Melanoma (mel). This graph illustrates the class imbalance of the lesion classes. Please click here to view a larger version of this figure.

The analysis of the dataset shows that there is an imbalance in the classes of the different types of lesions. The most common type of Melanocytic Nevi (nv), with approximately 6,705 samples, is the most common type, followed by Melanoma (1,113) and Benign Keratosis (1,099). On the contrary, there are some forms of lesions of clinical significance that are significantly less represented, such as Dermatofibroma (115) and Vascular Lesions (142). This disproportion poses a threat to machine learning models because they may have a tendency to be biased towards the majority classes and are incapable of having the potential to detect unusual but clinically significant lesions. To deal with this issue and improve the training of the model on the model performances with respect to all the classes, advanced preprocessing is required. Strategies are needed. These include techniques like targeted data augmentation and class balancing. The data can be balanced using the technique (Class balancing technique and class weight adjustment which encourages the model to discover substantial trends in the underrepresented classes. The hyperparameters used for XGBoost and LightGBM were primarily set to their default configurations, with minor adjustments based on preliminary experiments. For the deep neural classifier, architectural and training parameters such as the number of layers, neurons, learning rate, batch size, and number of epochs were selected empirically using validation data. The complete set of hyperparameters is provided in Table 3. In general, the number of dermoscopic images utilized in the present study is 10,015 altogether. This has the benefit of providing a vast collection of data to be trained and tested, and it is a tedious yet rewarding yardstick as well. Appraise the effectiveness of the proposed skin lesion classification system.

Data preprocessing

The preprocessing pipeline prepares the HAM10000 dataset for multimodal learning by standardizing images, extracting deep features, integrating clinical metadata, and addressing class imbalance.

Image Standardization: All dermoscopic images were resized to 224 × 224 pixels and normalized using z-score normalization.

Normalized intensity equation I_norm=Iμ/σ, formula in statistical analysis. (1)

Where I represent the raw image, µ denotes the pixel-wise mean, and σ is the standard deviation.

Deep Feature Extraction: Complementary deep features were extracted using three pre-trained convolutional neural networks: Efficient-Net B4, DenseNet201, along with MobileNetV2. Each network maps the normalized image to a feature vector.

Equation showing FFERA using deep learning networks: FDeepSEA, FDense, FMoNV2 with Iinput.(2)

The extracted features were concatenated to form a unified representation:

FFusion=FEffB4 ||FDense ||FMobV2 (3)

(where || means concatenation)

Clinical Metadata Integration: Clinical attributes, including age, sex, along with lesion localization, were cleaned, label encoded, and normalized using min-max scaling:

Scaling equation \(x_{\text{scaled}} = \frac{x - x_{\text{min}}}{x_{\text{max}} - x_{\text{min}}}\); formula. (4)

The processed metadata vector Mclinical was fused with image features to construct the final multimodal input:

Fcombined=FfusionMclinical (5)

Dataset Splitting: A stratified split was applied to preserve class distribution

Dtrain,Dtest=Split(Fcomibed,0.8) (6)

Class imbalance handling: The HAM10000 data set has a severe imbalance of the classes, where” nevus” (NV) samples prevail as underrepresented in other minority groups, like DF with VASC. In order to reduce this problem, the” Synthetic Minority Oversampling Technique” (class balancing technique) was employed. Using: New synthetic samples were produced as:

xnew=x+ λ(xzi - xi) (7)

Uniform distribution concept, formula λ~U(0,1), probability theory, statistics, mathematical analysis.

Where xi is a minority class sample, xzi is one of its nearest neighbors, and λ is a random value sampled from a uniform distribution between 0 and 1. The synthetic sample, as shown in Figure 4, is generated along the line segment joining x sub i. ​and xent joining xi ​and xzi.

Class distribution bar charts before and after SMOTE; sample balancing method comparison.
Figure 4: Class distribution in the HAM10000 dataset before/after applying the class balancing technique. (A) Before class balancing, with imbalance across lesion classes. (B) After class balancing in the combined feature space, where the representation of all classes is equal to avoid bias in the classifier training process. Please click here to view a larger version of this figure.

To address the issue of class imbalance in the HAM10000 dataset, the Synthetic Minority Over-Sampling Technique (class balancing technique) is applied. Class balancing technique generates synthetic samples for the minority classes by interpolating between existing data points, which helps increase the representation of underrepresented lesion categories. The end result of producing more examples of these minority classes is a more balanced dataset overall, with respect to all seven lesion types. This balanced representation will enable the classification models to learn better with every class and minimize the bias with the majority classes. Consequently, the model is fairer in classification and sensitive, especially to rare, yet clinically important skin lesions.

Privacy-preserving learning framework

The suggested system proposes a multimodal system of automated lesions classification on the skin that is privacy-aware and interpretable. The ultimate aim of the system is to enhance the diagnostic performance and at the same time, safeguard sensitive patient information throughout the training process. Patient privacy is an essential need in medical practice because healthcare data privacy laws and ethical considerations are highly important in healthcare settings. Thus, the suggested model will include a decentralized learning model that is based on the ideas of federated learning. In this decentralized environment, model training is accomplished on a group of distributed clients instead of aggregating all patient data in a centralized location. All participating clients train the model locally on their own data, and raw patient data do not leave the local environment. As an alternative to moving sensitive medical records, model updates or parameters are sent to a central server to be aggregated. This cooperative approach to learning enables the various institutions or sources of data to contribute to model training without compromising on data privacy.

Let wt(k) be the model parameters of the kth client at the tth iteration, and let nk be the sample size at that client. The update of the global model is calculated as:

Weighted average formula, Σ notation with indices, mathematical equation for dynamic weighted sum. (8)

This aggregation strategy ensures that clients with larger datasets contribute proportionally more to the global model while still allowing smaller clients to participate in the learning process. By enabling collaborative training without exchanging raw patient data, the proposed framework maintains privacy while still benefiting from distributed knowledge across datasets.

Federated experimental setup

A simulated federated learning system with the HAM10000 dataset was designed to confirm the efficiency of the offered privacy-aware framework. The data was divided into three clients to simulate a real-life multi-institutional environment with non-identically distributed (non-IID) data. Every client has a varying mix of lesion classes, and it represents a variation in the world between clinical centers. The identical multimodal feature extraction pipeline (EfficientNet-B4, DenseNet201, MobileNet V2, and clinical metadata) was locally run at every client. In their training, clients updated their local models on their own, and the learned parameters were only exchanged with the central server to be aggregated by the FedAvg algorithm. The trade-off between predictive accuracy and privacy was compared between the federated model and the centralized training approach to measure the performance of each. Test outcomes indicated in Figure 5 shows that the federated model can perform competitively, with only a slight decrease in accuracy relative to centralized learning, and much improved data privacy.

Non-IID data distribution bar chart; HAM10000 lesions by client; nv, mel, bkl categories.
Figure 5: Client-wise distribution of the HAM10000 dataset. This shows the allocation of skin lesion data among clients, demonstrating the diversity in data distribution. This demonstrates the heterogeneity of data among clients, a critical aspect of federated learning. Please click here to view a larger version of this figure.

Heterogeneous (non-IID) distributions of clients formed in HAM10000 were divided into three groups to model real-life clinical conditions. The distribution of different categories of lesions within each client is different, especially the class of nevus (nv), which is not evenly distributed across clients. This arrangement is indicative of the real-world difficulties of federated learning, in which data in institutions are not evenly distributed.

Performance comparison: centralized vs federated learning

To evaluate the effectiveness of the proposed federated learning framework, a comparative analysis was conducted between centralized and federated training strategies using the HAM10000 dataset, as shown in Figure 6. In the centralized setting, all data samples were aggregated into a single training pool. The best-performing centralized model, the stacked ensemble, achieved an overall accuracy of 96%. In contrast, the federated setting distributed the dataset across three clients with non-identically distributed (non-IID) data, where each client trained the model locally and shared only model parameters using FedAvg. The federated model achieved an overall accuracy of approximately 94%, corresponding to a performance difference of 2% compared to the centralized approach, as shown in Table 4. This marginal decrease is expected due to decentralized optimization and heterogeneous data distribution across clients.

Even though this small change happened, the federated model still did well at predicting. In centralized training, class-wise behavior shows that the majority of classes, like nevus (nv) (F1-score = 1.00), stay stable, while minority classes, like dermatofibroma (df) (F1-score ≈ 0.65–0.66), are more sensitive to distribution imbalance, which could affect federated performance even more. Notably, the federated structure minimizes the chances of exposing sensitive patient information since it does not require the sharing of raw medical data among clients.

Centralized vs Federated Learning accuracy comparison; bar chart; educational data analysis.
Figure 6: Federated learning vs. centralized learning comparison. This figure compares learning paradigms using performance metrics such as accuracy, precision, recall, and F1-score. This demonstrates the capability of federated learning to achieve performance comparable to that of the traditional learning approach while preserving privacy. Please click here to view a larger version of this figure.

The Table 4 results indicate that the federated learning model is capable of being competitive, and the drop in accuracy is only by a slight amount of approximately 2% compared to the centralized one. This slight reduction can be explained by the decentralized optimization and non-IID data distribution. However, the federated model has a tremendous advantage as far as privacy protection is concerned, as the sensitive patient information is not shared among the clients. To provide a fair comparison of the federated model and the centralized stacked ensemble model, the federated model was tested with the same architecture and hyperparameters. The privacy-preserving aspect discussed in this study is conceptual and intended to highlight the potential integration of techniques such as federated learning in future work. No experimental validation of privacy-preserving mechanisms is performed in the current implementation.

Multimodal feature fusion

The diagnosis of skin lesions usually includes skin observation and clinical history. Dermatologists, in most cases, do not only consider dermoscopic images by placing them in relation to the patient information (age, sex, and location of the lesion) to make their diagnostic judgments. The proposed system is based on the inspiration of this clinical workflow and incorporates a multimodal approach to learning to combine image-based and clinical data. CNNs are trained on pre-existing dermoscopic image deep features. Such networks recognize intricate visual designs, including color changes, lesion forms, structural anomalies, and texture features. Nevertheless, the features of images might not be sufficient to capture the clinical situation of a lesion. Clinical metadata related to every image is thus also included in learning. A feature fusion module will be created that will integrate deep image features with processed clinical attributes and demographic information. This composite representation constitutes an integrated multimodal feature representation that consists of both visual and contextual information of every lesion. The model can integrate several data sources to obtain complementary patterns that enhance overall classification ability. The multimodal representation allows the system to more effectively differentiate between visually similar lesions as well as factor in the clinical indicators. The model is more clinically meaningful and effective as it is a closer approximation of how dermatologists study lesions in clinical practice.

Stacked ensemble learning
The proposed framework uses a stacked ensemble learning strategy to further improve the predictive ability of the system. Ensemble learning is a composite method of predicting that uses two or more predictive models to enhance generalization and minimize the errors of prediction that can occur with single models. Multiple base learners are independently trained on the multimodal feature representation rather than using a single classifier. All base learners provide an estimate of how likely a particular sample is to be of a particular lesion class. These probability predictions are then aggregated at a meta-level. A weight is assigned to each base learner to show its relative importance to the end prediction. A softmax activation function is used to calculate the aggregated output to generate normalized class probabilities. The stacked ensemble method has a number of benefits. First, it minimizes prediction variance due to the combination of various models and thus enhances the performance of the generalization. Second, it enhances strength since various models describe various trends in the data. Third, ensemble learning enhances the classification of minority lesion classes, especially in medical data, where certain conditions of clinical interest are not as prevalent.

Explainable artificial intelligence integration

Medical AI systems should also offer clear explanations of their choices, even though high prediction accuracy is critical. To place trust in AI systems and be effective in their practice, clinicians should be able to comprehend how a model fits to the diagnosis it produces. In order to meet this need, the proposed framework incorporates explainable artificial intelligence (XAI) methods, as depicted in Figure 7.

Confusion matrices for XGBoost, LightGBM, Deep Neural Classifier, and Stacked Ensemble models.
Figure 7: Confusion matrices of different classification models for multi-class skin lesion classification. (A) XGBoost, (B) LightGBM, (C) Deep Neural Classifier, and (D) Stacked Ensemble model. Each confusion matrix shows the relationship between the true class (rows) and the predicted class (columns) for all seven types of skin lesions: akiec, bcc, bkl, df, mel, nv, and vasc. The XGBoost and LightGBM models perform well for the nv and bkl classes, though there is some confusion between mel and nv. The Deep Neural Classifier improves the classification of bkl and df and decreases off-diagonal confusion. The Stacked Ensemble model shows the greatest classification consistency, with the diagonal becoming increasingly dominant. Please click here to view a larger version of this figure.

The system includes two popular explainability approaches (model interpretability technique (SHapley Additive Explanations) and model interpretability technique (Local Interpretable Model-agnostic Explanations)) to give an insight into what the model predicts. The model interpretability method explains features at the level of features by measuring the extent to which each input feature has contributed to the overall prediction. It assists in determining which clinical variables/ visual qualities have the most impact on the result of the classification. This enables researchers and clinicians to see the model's overall behavior across the dataset. Model interpretability technique, on the other hand, deals with local explanations of individual predictions. It emphasizes the areas of the dermoscopic image that have the greatest impact on the model's decision. These pixel-level visual explanations enable clinicians to visually inspect the areas of the lesion that informed the classification. The proposed framework offers global and local interpretability; it is achieved by integrating the model interpretability technique. The dual-explanation mechanism enhances transparency and enables clinicians to assess whether the model is targeting medically significant patterns.

Clinical decision support potential

Privacy-preserving learning, multimodal feature fusion, ensemble modeling, and explainable AI are key components of an integrated and robust system for automatic skin lesion classification. Ideally, the system should not only have high prognostic power, but also be transparent and secure, which are two key factors in medical systems, as shown in Figure 8

ROC curves for classification models A-C; chart with AUC values evaluating true vs. false positive rates.
Figure 8: Receiver operating characteristic (ROC) curves for the stacked ensemble model. (A–C) This shows the ROC curves for the seven skin lesion types, with true positive rate (sensitivity) and false positive rate (1-specificity). The area under the curve (AUC) represents the performance of the stacked ensemble model in discriminating between the classes. Please click here to view a larger version of this figure.

This system provides explainable predictions and privacy protection. As a result, it is a beneficial system for other dermatological diagnostic systems. This system allows health practitioners/ dermatologists to assess lesion suspiciousness and improve diagnostic accuracy and, as a result, help practitioners/ dermatologists to diagnose patients at an early stage when they may have a more serious disease (e.g., melanoma). In essence, as shown in Figure 9, this system seeks to bring the technologies of using high-tech artificial intelligence (AI) systems and implementing real-world applications into practice, to help dermatologists diagnose patients more accurately and with more confidence while ensuring the privacy and security of patients and their comfort.

Feature importance analysis, bar graph and prediction tables, machine learning classification results.
Figure 9: Explainability results using model interpretability techniques for multi-class skin lesion classification. (A) SHAP plot showing feature contributions influencing benign and malignant lesion predictions. (B) LIME explanation for the bcc prediction, illustrating the features contributing positively and negatively to the classification outcome. (C) LIME explanation for the akiec prediction, highlighting the most influential features involved in the model decision-making process. These interpretability visualizations demonstrate the regions and extracted features that significantly affect the model’s predictions, improving transparency and understanding of the classification process in skin lesion assessment. Please click here to view a larger version of this figure.

Evaluation strategy

To avoid sampling bias and maintain the original class distribution across all skin lesion categories, the dataset was split into an 80:20 train–test split. The training subset was then split in the ratio 90:10 train: validate, to tune the hyperparameters and optimize the model. The test set was not used in the training process at any stage and was only applied at the end of the training process as a final test to avoid leakage of data and ensure an unbiased performance assessment. All models were pre-processed and trained in equal settings, data was partitioned and augmented in the same way, and evaluation protocols were applied and followed in the same manner, which allowed for fair and reproducible comparisons. The models were thoroughly evaluated based on accuracy, precision, recall, F1 score, and AUC, with a detailed analysis of the class-wise results to determine their robustness for both major and minority classes of lesions. This standardized validation tool would help to increase the reliability, transparency, and generalizability of the proposed approach, and overcome the potential inconsistencies in performance reporting.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Four classification methods (XGBoost, LightGBM, a Deep Neural Classifier, and a stacked ensemble model) were evaluated for multi-class skin lesion classification. The models achieved overall accuracies of 92%, 90%, 94%, and 96%, respectively, demonstrating that c

Class-wise performance

A detailed class-wise evaluation, including precision, recall, and F1-score for each lesion category, is provided. For the akiec class (support = 65), the stacked ensemble achieved a precision of 0.72, a recall of 0.73, and an F1-score of 0.72, slightly improving over XGBoost (F1 = 0.70), LightGBM (F1 = 0.68), and the deep neural classifier (F1 = 0.71). For bcc (support = 103), the stacked ensemble obtained precision = 0.87, recall = 0.84, and F1-score = 0.85, comparable to XGBoost (F1 = 0.83) and LightGBM (F1 = 0.81), and slightly higher than the deep neural classifier (F1 = 0.84). For bkl (support = 220), the stacked ensemble achieved precision = 0.93, recall = 0.85, and F1-score = 0.89, outperforming XGBoost (F1 = 0.87), LightGBM (F1 = 0.86), and the deep neural classifier (F1 = 0.88). For df (support = 23), performance remained relatively lower across all models. The stacked ensemble reported precision = 0.67, recall = 0.66, and F1-score = 0.66, similar to XGBoost (F1 = 0.65), LightGBM (F1 = 0.63), and the deep neural classifier (F1 = 0.65).

For mel (support = 223), the stacked ensemble achieved precision = 0.66, recall = 0.97, and F1-score = 0.78. The Deep Neural Classifier also shows high recall (0.96) for melanoma but relatively lower precision (~0.66), indicating a higher number of false positives. This highlights that while sensitivity for melanoma detection is high across models, precision remains comparatively lower. For the nv class (support = 1341), all models demonstrated 100% classification performance, with precision, recall and F1-scores equal to 1.00, highlighting the consistently high performance on the majority class. For vasc (support = 28), the stacked ensemble achieved precision = 1.00, recall = 0.93, and F1-score = 0.96, comparable to the deep neural classifier (F1 = 0.96) and slightly higher than XGBoost (F1 = 0.95) and LightGBM (F1 = 0.94).

Model comparison

The stacked ensemble model performed similarly or better on all metrics compared to individual models. Importantly, the rise in melanoma detection is reflected in a higher recall (0.97), suggesting an improvement in the model's sensitivity to important cases. The decreased performance of the minority classes (df, 23 samples; akiec, 65 samples) indicates the influence of class distribution on model performance. Crucially, overall accuracy is calculated over all samples and affected by class imbalance, with the nv class (support = 1341) predominating. As such, fluctuations in precision or recall for minority classes do not explain the accuracy values reported.

Comparison with existing methods

To compare the performance of the proposed system, we present a comparison with previous methods in Table 5 and Table 6. The proposed stacked ensemble framework performs on par with previously reported approaches with an accuracy of 96%. Furthermore, the proposed model also offers multimodal feature integration and explainability, which are not always considered in other approaches. The reported performance values are based on the results reported in the original papers and may differ due to different dataset splits and evaluation methods.

Key observation

The 94% is the overall performance of all the classes, and is influenced by the majority class (nv, support = 1341). Therefore, the performance of the minority classes (e.g., df, mel precision) does not mean that this is inconsistent with the reported overall accuracy. The stacked ensemble achieved the highest accuracy (96%) with good performance of the classes. The increase in accuracy of the different classes (e.g., recall of melanoma) further suggests that the use of multi-modeling approaches improves the predictive performance of classification of multi-class skin lesions.

This is further tested by comparing the proposed approach with the state-of-the-art models on the ISIC 2019 data set. The analysis of the performance of the widely used deep learning architectures, such as ResNet50, EfficientNet-B0, DenseNet121, and the proposed stacked ensemble method, was conducted with the baseline models. Each model was tested with the same experimental conditions, making them comparable with each other. The results, shown in Table 6, show that the proposed model outperforms the existing models on all the evaluation metrics. The proposed stacked ensemble model achieves a higher accuracy of 96% and an AUC value of 0.970 compared to the other traditional machine learning models and deep learning models, as seen in Table 6. As for capturing various feature representations, models like EfficientNet-B0 and DenseNet121 show strong baseline performance when given an image, but are inadequate at handling such tasks on their own. On the other hand, the ensemble method is successfully applied to combine several models and guarantees better generalization and robustness. Moreover, the proposed method is consistently better in terms of precision, recall, and F1-score, suggesting that it is robust in various classes with a promising ability to be used in real clinical practice. To facilitate reproducibility, transparency, and reliable comparison of all models, the experiments were carried out following a standard protocol, with a range of performance criteria and the same validation conditions.

DATA AVAILABILITY:

The HAM10000 skin lesion dataset used in this study is publicly available through Kaggle at https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000. The source code and implementation files used for data preprocessing, model training, evaluation, and analysis have been provided as supplementary material along with the manuscript submission.

ENVIRONMENT SETUP FOR MODEL DEVELOPMENT
ComponentSpecification
Compute EnvironmentGoogle Colab (Free Tier)
CPU2 Core CPU @ 2.20 GHz
GPUNVIDIA T4 / P100
RAM12 GB
Operating SystemUbuntu 22.04
Python VersionPython 3.10
Deep Learning ModelsEfficientNet B4, DenseNet201, MobileNetV2
ClassifiersXGBoost, LightGBM, DNC, Stacked Ensemble
Core LibrariesTensorFlow 2.12, Keras 2.12, NumPy, Pandas
Explainability ToolsSHAP, LIME
Data BalancingSMOTE

Table 1: System configuration. It is used for model development and evaluation. It includes details of the hardware and software environment, such as processor type, memory, operating system, and software frameworks.

DETAILED CLASSIFICATION REPORTS FOR MULTI CLASS SKIN LESION PREDICTION
Class PrecisionRecallF1 Score Support
XGBoost (Accuracy: 92%)
akiec0.700.710.7065
bcc0.850.820.83103
bkl0.910.830.87220
df0.650.650.6523
mel0.630.950.76223
nv1.001.001.001341
vasc1.000.910.9528
LightGBM (Accuracy : 90%)
akiec0.680.690.6865
bcc0.830.800.81103
bkl0.900.820.86220
df0.630.630.6323
mel0.620.940.75223
nv1.001.001.001341
vasc0.990.900.9428
Deep Neural Classifier ( Accuracy : 94%)
akiec0.950.90.9265
bcc0.90.940.92103
bkl0.970.920.94220
df0.990.960.9723
mel0.990.90.94223
nv0.140.860.241341
vasc0.100.860.1828
Stacked Ensemble (Accuracy: 96%)
akiec0.720.730.7265
bcc0.870.840.85103
bkl0.930.850.89220
df0.670.660.6623
mel0.660.970.78223
nv1.001.001.001341
vasc1.000.930.9628

Table 2: Detailed classification performance metrics for multi-class skin lesion prediction across all models. This table presents class-wise precision, recall, F1-score, and support for each skin lesion category.

ModelHyperparameterValue
XGBoostLearning RateDefault (0.3)
Number of Trees (n_estimators)100
Maximum Depth6
Subsample1
Colsample_bytree1
Objectivemulti:softmax
Evaluation Metricmlogloss
LightGBMLearning RateDefault (0.1)
Number of Trees (n_estimators)100
Maximum Depth-1
Number of Leaves31
Feature Fraction1
Bagging Fraction1
Objectivemulticlass
Metricmulti_logloss
Deep Neural ClassifierNumber of Layers3 Dense Layers
Neurons per Layer256, 128, 64
Activation FunctionReLU
Output ActivationSoftmax
OptimizerAdam
Learning Rate0.001
Batch Size32
Number of Epochs30
Dropout0.5
Loss FunctionCategorical Crossentropy

Table 3: Hyperparameter settings. Hyperparameter settings used for training the models, including learning rate, batch size, number of epochs, and optimizer configurations.

Centralized Vs Federated Model
Training StrategyCentralized (Stacked Ensemble)Federated ModelDifference (Δ)
Accuracy (%)96942

Table 4: Centralized vs. Federated learning comparison. Comparison between centralized and federated learning approaches in terms of performance, privacy, and computational characteristics.

Ref. Paper MethodModel TypeYearReported PerformanceKey Contribution
[2]CNN Framework for Skin Cancer DetectionCNN2020High accuracy (~90%+)Early CNN-based classification
[4]Melanoma Diagnosis using Deep LearningCNN2021Improved classification performanceDermoscopic image analysis
[8]Optimized CNN with CheckpointsCNN2023Enhanced accuracy (~92–94%)Model optimization strategy
[9]Deep Learning + XAI FrameworkCNN + Explainability2023Improved interpretabilityXAI integration
[10]Combined CNN FeaturesCNN2023Competitive performance (~90%+)Feature combination
[18]SkinSage XAICNN + XAI2023Improved trust & interpretabilityExplainable AI system
This WorkStacked Ensemble + Multimodal + XAIEnsemble96%Ensemble + interpretability + privacy-aware

Table 5: Comparison with existing methods. Performance comparison of the proposed method with existing state-of-the-art methods using standard evaluation metrics.

 State-of-the-Art Models
ModelDatasetAccuracyPrecisionRecallF1-scoreAUC
ResNet50ISIC 20190.8420.8350.8280.8310.912
EfficientNet-B0ISIC 20190.8740.8680.8610.8640.935
DenseNet121ISIC 20190.8610.8540.8480.8510.926
XGBoostISIC 20190.920.9050.8920.8980.948
LightGBMISIC 20190.90.8890.880.8840.94
Deep Neural ClassifierISIC 20190.940.9050.890.8920.95
Proposed Stacked EnsembleISIC 20190.960.940.930.9350.97

Table 6: Comparison with state-of-the-art models. Comparative evaluation of the stacked ensemble model with other state-of-the-art architectures on the ISIC 2019 dataset. The accuracy, precision, recall, F1-Score, and AUC are used to measure the performance. The proposed model outperforms other models, thereby showing its effectiveness in multi-class skin lesion classification.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The current protocol outlines a reproducible pipeline for creating an interpretable, privacy-sensitive, multimodal framework to automatically classify skin lesions. The protocol follows a systematic pattern of enhancing diagnostic performance through model transparency, combining dermoscopic image analysis with clinical metadata and interpretable machine learning methods. The HAM10000 skin lesion dataset is publicly available and allows the standardized assessment and facilitates the reproducibility of further research in the field of dermatological image research16. The image preprocessing and normalization step is one of the most important steps in the protocol, as it guarantees that dermoscopic images are standardized prior to the extraction of features and training of a model. Artifacts that may be present in dermoscopic images include uneven illumination, blocking of hair, or noise in the background, which can influence the performance of the models. Resizing the images to a fixed resolution and normalization can reduce these differences, and the model can then be focused on lesions of interest from a clinical perspective, such as pigmentation patterns, irregular borders, and asymmetry. Deep learning-based dermatology systems require proper preprocessing to produce reliable performance, as has been shown in earlier research in automated skin cancer classification2.

The deep feature extraction workflow based on multiple convolutional neural network (CNN) architectures is also an important part of this process. In this procedure, EfficientNet-B4, DenseNet201, and MobileNetV2 are employed to learn complementary features in dermoscopic images. These architectures have different advantages in terms of features and computational costs. The proposed protocol can extract features using multiple models and then fuse them to get global lesion patterns as well as specific features of lesion morphologies that can be helpful in identifying benign and malignant lesions.There is also a multimodal feature fusion stage. Clinical diagnosis in dermatology often comprises visual and contextual clinical information (age and gender of the patient, and site of lesion). The architecture fuses dermoscopic image features with contextual information to add a diagnostic context to the purely image-based models. It's a more practical multimodal approach, and it boosts the classification system.

The protocol also incorporates explainable artificial intelligence (XAI) techniques, particularly the model interpretability technique, to explain predictions of the classification models. Explainability is essential for medical AI systems as clinicians need to be aware of the rationale behind automated predictions to include them in their diagnostic processes. Model interpretability techniques yield global feature importance by measuring the impact of each feature on the model's predictions, whereas model interpretability techniques yield local explanations by showing the image regions that contribute to the model's predictions. These interpretability tools help verify that the model focuses on clinically relevant structures rather than spurious correlations, thereby improving trust and transparency in AI-assisted diagnostic systems20.

There are a number of variations in the protocol that can be applied, depending on the dataset used or the computational environment. A typical issue with dermatological data is class imbalance, where the number of samples per lesion category varies considerably. The HAM10000 dataset has a much higher proportion of benign nevi than other lesion categories. This imbalance could be mitigated through oversampling techniques such as the Synthetic Minority Oversampling Technique (SMOTE), which can generate synthetic minority data for rare lesion categories. Other strategies, such as data augmentation, class weighting, or focal loss, may also help to boost model accuracy on less common lesion types.

While the proposed framework offers various benefits, it has some limitations. The model is trained on the HAM10000 dataset, which may not cover all possible imaging scenarios, skin phenotypes, or ethnic groups encountered in dermatology. It is therefore important to validate the framework using external datasets to assess its generalization performance. Additionally, incorporating multiple deep learning models and ensemble learning increases the model's computational footprint and may be challenging in resource-limited clinical settings.

The framework, as presented, offers several advances over conventional deep learning methods that use only images. Multimodal data integration provides richer information, and ensemble learning increases the model's robustness by aggregating predictions from diverse classifiers. Further, the application of explainable AI techniques offers interpretability of the decision-making, which is a concern for deep learning models in the medical field. Insights from experimental dermatology techniques inform this protocol by emphasizing reproducible imaging and analysis workflows essential to biological research. Advanced imaging approaches, including three-dimensional skin models and dermoscopic imaging procedures, provide a deeper understanding of skin structure and disease mechanisms, thereby supporting the design and improvement of computational diagnostic tools21,22.

The approach outlined in this protocol can be applied in various ways in dermatology research and clinical settings. This approach could help develop computer-aided diagnostic systems for early melanoma detection and other skin conditions, help doctors to navigate through large databases of skin images, and be used as part of tele dermatology systems, allowing physicians to remotely consult dermatologists. Moreover, the system's explainability and privacy-preserving features enable it to be used in a multi-institutional medical AI research setting where multiple institutions share data while ensuring privacy. Future research could include incorporating larger, multi-institutional datasets, adding additional clinical features, and exploring other privacy-preserving techniques, such as federated learning, which is mentioned here as a conceptual extension. Also, the lack of an ablation study comparing multimodal models with image-only and metadata-only models is a limitation and will be addressed in future work to assess the role of each data source.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors have nothing to disclose. We have no conflicts of interest. The authors declare that artificial intelligence tools were used solely for language editing and formatting. All scientific content, analysis, and interpretations were developed and verified by the authors.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors thank MVN University, Palwal, for providing academic guidance and research support. The authors also acknowledge the publicly available HAM10000 skin lesion dataset, which was used for the experimental evaluation of this study.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
DenseNet201 CNN ArchitectureIBMhttps://arxiv.org/abs/1608.06993Deep learning model for image classification
EfficientNet-B4 CNN ArchitectureGooglehttps://arxiv.org/abs/1905.11946Deep learning model for image classification
Google Colaboratory PlatformGooglehttps://colab.research.google.comCloud-based computational environment
HAM10000 Skin Lesion DatasetHarvard Dataversehttps://doi.org/10.7910/DVN/DBW86TDermoscopic image dataset
Keras Deep Learning APIGoogleVersion 2.xNeural network API
LIME Explainability LibraryLIME ProjectVersion 0.xModel interpretability technique
MobileNetV2 CNN ArchitectureGooglehttps://arxiv.org/abs/1801.04381Deep learning model for image classification
Matplotlib Visualization LibraryMatplotlib Development TeamVersion 3.xUsed for generating plots and performance visualization
NVIDIA GPUNVIDIARTX SeriesComputational hardware for model training
NumPy Numerical Computing LibraryNumPy DevelopersVersion 1.xData analysis software
OpenCV Image Processing LibraryOpenCV FoundationVersion 4.xImage processing library
Pandas Data Analysis LibraryPandas Development TeamVersion 1.xData analysis software
Python Programming EnvironmentPython Software FoundationVersion 3.9+Data analysis software
SHAP Explainability LibrarySHAP ProjectVersion 0.xModel interpretability technique
SMOTE Oversampling Techniqueimbalanced-learn ProjectVersion 0.xClass balancing technique for handling imbalanced datasets
Scikit-learn Machine Learning Libraryscikit-learn ProjectVersion 1.xMachine learning library
TensorFlow Deep Learning FrameworkGoogleVersion 2.xDeep learning framework

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Skin Lesion ClassificationMultimodal EnsembleExplainable AIPrivacy PreservingDeep Learning ModelsClass BalancingEfficientNet B4Clinical MetadataXGBoost ClassifierModel Interpretability

Related Articles