An Explainable AI-Based Transfer Learning Method for Breast Cancer Prediction

P. Rajeswari; Surbhi Bhatia Khan; Prasad P S; Mahesh T R; Ali Algarni; Ahlam Almusharraf

doi:10.3791/70011

Method Article

An Explainable AI-Based Transfer Learning Method for Breast Cancer Prediction

DOI:

10.3791/70011

⸱

June 22nd, 2026

P. Rajeswari¹ , Surbhi Bhatia Khan²^,³ , Prasad P S⁴ , Mahesh T R⁴ , Ali Algarni⁵^,⁶ , Ahlam Almusharraf⁷

¹Department of Computer Science and Engineering, SRM Institute of Science and Technology, Ramapuram campus, ²School of Science, Engineering and Environment, University of Salford, ³Division of Research and Development, Lovely Professional University, ⁴Department of Computer Science & Engineering, Faculty of Engineering and Technology, JAIN (Deemed-to-be University), ⁵Department of Informatics and Computer Systems, College of Computer Science, King Khalid University, ⁶Center for Artificial Intelligence, King Khalid University, ⁷Department of Management, College of Business Administration, Princess Nourah bint Abdulrahman University

Summary

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

An explainable transfer-learning protocol using EfficientNet-B0 and Grad-CAM is developed for classifying breast ultrasound images into benign, malignant, and normal categories, providing interpretable diagnostic heatmaps suitable for clinical decision support applications.

Abstract

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Breast cancer identification via ultrasound images requires high accuracy and transparency to assist clinicians in taking appropriate decisions. This work demonstrates a deep learning system for classification of breast ultrasound images into benign, malignant, and normal images using the EfficientNet-B0 architecture fine-tuned on the Breast Ultrasound Identification (BUSI) dataset. To mitigate class imbalance and stabilize the network, data augmentation including random horizontal flipping, rotation, and color jittering is applied. Gradient-weighted Class Activation Mapping (Grad-CAM) is utilized to generate visual explanations by identifying regions of interest such as tumor margins and texture patterns. The model achieved an average accuracy of 99%, demonstrating high efficacy in lesion detection. The integration of Explainable AI (XAI) not only improves diagnostic confidence but also bridges the gap between clinical practice and AI. The results prove the potential for combining high-performing deep learning models with interpretability methods in developing reliable breast cancer diagnosis tools suitable for actual clinical practice.

Introduction

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Breast cancer is one of the most significant life-threatening diseases affecting women worldwide. According to the World Health Organization (WHO), breast cancer contributes to a majority of cancer-related deaths, and early diagnosis is critical in terms of enhancing the survival rate¹. Ultrasound imaging has become a routine diagnostic procedure for breast cancer because it is safe, inexpensive, and can produce real-time images. As opposed to mammography, ultrasound is extremely effective at differentiating solid masses and growth that contain fluid and, therefore, serves as an excellent diagnostic aid in the presence of breast pathology. Ultrasound image interpretation is highly subjective and dependent on high-level expertise and causes variability of diagnosis and possible human error.

Over the past few years, AI and deep learning have demonstrated vast capabilities in automating medical image analysis. Deep learning-based models, particularly CNNs, excel in object segmentation, detection, and classification². However, the adoption of AI in healthcare is inhibited by the "black-box" nature of these models. The term "black-box" refers to the lack of transparency in how a model reaches a specific conclusion; while the inputs and outputs are known, the internal logic and specific features used to make a clinical prediction remain hidden from the user.

The motivation for this work stems from the urgent need for reliable instruments to support the early diagnosis of breast cancer, which remains a leading cause of female mortality globally. Although deep learning models are highly effective, their lack of interpretability limits integration into clinical settings³. The present work addresses these gaps by employing EfficientNet-B0 for high-performance classification and integrating Explainable AI (XAI) techniques, such as Grad-CAM, to provide visual insights into the model’s decision-making process. By balancing transparency and accuracy, this study aims to develop a trustworthy AI system that provides sound diagnostic support to clinicians. XAI methods reveal the underlying reasoning of model predictions, allowing clinicians to verify results and bridge the gap between clinical practice and automated systems⁴. In breast ultrasound classification, this interpretability ensures that AI-based diagnoses are both accurate and clinically justifiable.

Despite advancements in deep learning, breast ultrasound image classification is not without challenges, including class imbalance, where minority classes like malignant tumors are underrepresented, leading to biased models⁵. The lack of interpretability in deep learning models renders them hard to apply in clinical practice, as clinicians require transparent decision-making. Constraints on the dataset such as limited size and diversity constrain the generalizability of the models, while the complex image features of images in ultrasound such as tumor borders and texture are hard to learn⁶. The solutions to the aforementioned problems call for a robust and interpretable framework ensuring high accuracy with clear insight into model prediction⁷.

The study aims to develop a deep learning framework capable of classifying breast ultrasound images into benign, malignant, and normal categories while improving generalization through targeted augmentation for minority classes. The framework integrates Gradient-weighted Class Activation Mapping to provide visual explanations of predictions and evaluates performance using accuracy, precision, recall, and F1-score, demonstrating its suitability as an interpretable clinical decision-support system. In contrast to the explainable ultrasound-based studies of breast cancer conducted earlier where Grad-CAM is mostly utilized as a post-hoc visualization⁸, the proposed framework combines lesion-conscious preprocessing and class-conscious learning at the training phase. Ground-truth masks of the Breast Ultrasound Images (BUSI) dataset are overlaid on ultrasound images to highlight the boundaries of lesions and internal echo patterns when extracting features, and selective augmentation is used on minority diagnostic groups to reduce the imbalance in the classes without changing clinically significant appearance features⁹. This combined, lesion-driven lesion enhancement, EfficientNet-B0 transfer learning, and Grad-CAM explainability is a combination that improves the diagnostic robustness and yields interpretations that are consistent with lesion morphology, which justifies the hypothesis that anatomically guided lesion enhancement and explainability can make AI-assisted breast cancer diagnostics reliable and trustworthy¹⁰.

Traditional medical imaging traditionally relied on conventional image processing methods, incorporating filtering, segmentation, and morphological operations to identify visible structures for diagnosis¹¹. However, these processes generally required manual tuning and struggled to accommodate the inherent variability and complexity of medical images. For instance, breast ultrasound interpretation was historically dependent on the subjective expertise of radiologists, where the identification of malignant and benign lesions relied on visual depictions prone to human error or misinterpretation.

The advent of deep learning has catalyzed a transformative revolution in medical imaging, primarily through the deployment of Convolutional Neural Networks (CNNs)¹². Among these, EfficientNet-B0 represents a significant innovation, utilizing a compound scaling method that balances network depth, width, and resolution to achieve highly efficient and accurate computations. In the context of breast ultrasound imaging, the model utilizes sequential convolutional layers to automatically learn hierarchical features, effectively eliminating the need for subjective manual feature extraction while significantly enhancing diagnostic accuracy and reliability¹³. Despite this impressive potential, deep learning in medical imaging remains an evolving field with unresolved challenges. The quality and availability of data are paramount, as deep learning models require vast repositories of annotated images to ensure successful generalization. Furthermore, the "black-box" nature of these models poses significant hurdles to clinical acceptability; ethical concerns regarding potential biases in training data may also lead to skewed or discriminatory diagnostic predictions¹⁴. Consequently, this research necessitates stringent testing and validation protocols to ensure the models remain objective and clinically sound. To address these limitations, current developments in deep learning focus on improving model interpretability and robustness. Techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM), as applied in this study, enable visualization of specific image regions that influence model outcomes, thereby fostering a deeper understanding of model behavior and supporting clinical decision-making. Additionally, breakthroughs in federated learning frameworks facilitate privacy-conscious collaborations between institutions, enabling the use of heterogeneous data without the need to share raw datasets. Such advancements are progressively enabling more dependable, interpretable, and unbiased deep learning applications in medical imaging, ultimately enhancing clinical criteria and patient care ¹⁵.

The significance of interpretable medical decision systems¹⁶ has been highlighted through recent progress in explainable artificial intelligence. Interpretable frameworks fusion-based interpretable frameworks have shown better classification and transparency through the combination of more than one feature representation and more than one explanation mechanism¹⁷. Recent research has demonstrated a high potential of utilizing deep learning models in conjunction with the XAI methods to boost diagnostic confidence on breast cancer diagnosis and other medical prediction tasks. These articles support the need to have both high predictive and interpretable reasoning by clinicians, which is consistent with the aim of the current study to create an effective and understandable ultrasound classification system¹⁸.

Table 1 presents the literature review and existing technologies.

Study	Objective	Remark
Humayra Afrin et al.(2023)¹⁹	Review deep learning applications for breast mass ultrasound diagnosis and prognosis.	Lesion classification showed the highest performance, with fewer studies on prognosis.
Gelan Ayana et al. (2022)²⁰	Review ultrasound-guided nanocarriers for breast cancer chemotherapy.	Ultrasound-responsive nanocarriers improve drug delivery but face physiological barriers.
Valerio Di Paola et al. (2022)²¹	Analyse imaging techniques for accurate N-staging in breast cancer.	MRI and US play key roles, but false positives/negatives affect diagnostic accuracy.
Adyasha Sahu et al. (2024)²²	Propose an ensemble deep learning classifier for breast cancer detection.	Achieved high accuracy using AlexNet, ResNet, and MobileNetV2 with LoG and high-boosting.
Alireza Rezazadeh et al.(2022)²³	Develop an explainable ML pipeline for breast cancer diagnosis using ultrasound images.	Uses texture features and decision tree ensembles to improve interpretability of predictions.
Asaf Raza et al. (2023)²⁴	Develop DeepBreastCancerNet for robust breast cancer detection.	Achieved 99.63% accuracy using 24-layer architecture with multiple optimizations.
Kiran Jabeen et al.(2022)²⁵	Propose a deep learning framework using DarkNet-53 and feature fusion.	Achieved 99.1% accuracy using optimized feature selection techniques.
Gelan Ayana et al.(2022)²⁶	Develop multistage transfer learning (MSTL) for ultrasound breast cancer classification.	Improved accuracy using medical image-based transfer learning with EfficientNetB2, InceptionV3, and ResNet50.
Saksham Gupta et al. (2023)²⁷	Use modified ResNet50 for ultrasound breast cancer classification.	Achieved 97.8% accuracy, reducing diagnosis time and improving early detection.
Karl Kratkiewicz et al. (2022)²⁸	Review advanced imaging methods like ultrasound tomography (UST) and photoacoustic tomography (PAT).	UST and PAT enhance breast cancer detection with quantitative imaging.

Table 1: Shows the literature and review of existing technologies.

The suggested model is based on the achievements in deep learning in the classification of breast ultrasound images, which overcomes the main challenges of this research, including the quality of data, interpretability of a model, and the efficiency of the computations. In contrast to the old techniques, where manual feature extraction is used, in this work, the authors take advantage of EfficientNet-B0, the latest architecture, which combines depth, width, and resolution into the best working scheme. The model uses Grad-CAM to visualize the explanation of its prediction, which increases its transparency and clinical confidence. Although the current solutions, including ensemble models and transfer learning, have become highly accurate, they cannot be easily interpreted or may demand a large amount of computing power. The presented work addresses these gaps as it is both highly accurate (99) and interpretable to ensure that the model is reliable and viable in practical aspects of clinical decision-making. The approach proposed by concentrating on the imbalance of classes and practical use is a new standard of AI-driven breast cancer diagnosis.

The paper uses a publicly available BUSI, a previously collected set of anonymized clinical images. There were no direct human subjects and animals in this study, and hence no need for institutional ethical approval (IRB/IACUC). All operations are in accordance with common standards of the secondary use of the publicly available medical imaging data.

Protocol

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

1. Dataset characteristics

The study utilized the publicly available Breast Ultrasound Images (BUSI) dataset, which provided a well-documented repository for the analysis and classification of breast cancer²⁹. This dataset proved particularly appropriate as it incorporated a vast collection of ultrasound images required to establish and validate deep learning models for medical imaging. The dataset comprised 780 ultrasound images collected from 600 women aged 25–75 years, representing a diverse population for breast cancer diagnosis. All images maintained an average resolution of 500 x 500 pixels and utilized the PNG format to preserve high-quality visual data for analysis. Figure 1 illustrates the overall workflow of the proposed framework, including preprocessing, augmentation, EfficientNet-B0 feature extraction, classification, and Grad-CAM interpretability.

Breast ultrasound images: benign, malignant, normal tissue comparison; cancer diagnosis, medical imaging.
Figure 1: Representative samples from the BUSI dataset. (A) Benign, (B) Malignant, and (C) Normal images. Panels illustrate morphological diversity used for training. Please click here to view a larger version of this figure.

2. Experimental workflow

To ensure reproducibility, the investigation executed the processing pipeline in sequential stages. First, the system loaded the BUSI ultrasound images and corresponding ground-truth masks and resized them to 224×224 pixels, followed by ImageNet normalization. Second, the framework overlaid each image with its mask to emphasize lesion regions before splitting the dataset into training (70%), validation (15%), and testing (15%) subsets using stratified sampling. Third, the study applied targeted augmentation consisting of horizontal flipping, rotation (±15°), and color jittering primarily to the minority classes. Fourth, the researchers fine-tuned the EfficientNet-B0 model, which was pretrained on ImageNet, by replacing the final classification layer with a three-class output layer. The training process employed the Adam optimizer (learning rate 0.00005), step learning-rate decay (γ = 0.1 every 7 epochs), cross-entropy loss, a batch size of 8, and early stopping with a patience of 2. Finally, Grad-CAM generated attention heatmaps from the last convolutional layer to visualize diagnostically relevant regions and support clinical interpretation. Table of Materials lists the essential datasets, software libraries, and hardware configurations required to reproduce the study's breast ultrasound classification and interpretability framework.

The BUSI dataset exhibited a natural class imbalance, a characteristic frequently observed in medical datasets. The distribution favored the benign class, followed by malignant and normal cases. Specifically, the dataset contained 487 benign, 210 malignant, and 133 normal images. While data augmentation increased training variability, it did not alter the fundamental sample count in each class. This distribution reflected real-world clinical scenarios where benign conditions occur more frequently than malignant tumors. Although such an imbalance accurately portrayed the prevalence of breast abnormalities in clinical environments, it presented challenges for deep learning training, as it could lead to biased predictions and suboptimal performance for minority classes. Consequently, mitigating this imbalance became essential to formulate an effective and generalizable model that performed efficiently across every diagnostic category. Figure 2 shows representative ultrasound images from the dataset, illustrating examples of benign, malignant, and normal breast tissue samples.

Deep learning pipeline diagram: image preprocessing, feature extraction, classification, Grad-CAM results.
Figure 2: Proposed model workflow. Sequential pipeline from input and mask-overlay preprocessing to EfficientNet-B0 training and Grad-CAM output. Please click here to view a larger version of this figure.

3. Data Preprocessing

The study resized all images to 224 x 224 pixels, adhering to the standard input dimensions for the EfficientNet-B0 architecture. This resizing ensured dataset consistency and reduced computational requirements, as defined by equation 1.

$Image resizing equation, $ I_{\text{resized}} = \text{Resize}(I_{\text{original}}, (224,224)) $, formula.$

The framework normalized pixel values using the mean and standard deviation of the ImageNet dataset ([0.485, 0.456, 0.406] for mean and [0.229, 0.224, 0.225] for std). This normalization scaled the input data to enhance training stability and convergence speed, calculated according to equation 2.

$Standardization formula: $ \hat{x} = \frac{x - \mu}{\sigma} $; statistical method, equation.$

To improve feature extraction and the identification of regions of interest, the researchers superimposed the original images onto their corresponding ground-truth masks. Figure 3 illustrates the ground-truth lesion masks and the resulting overlaid ultrasound images that highlighted tumor boundaries.

Ultrasound segmentation results; benign and malignant tumor masks overlaid on ultrasound images.
Figure 3: Mask and overlaid images. (A) Original ultrasound, (B) ground-truth mask, and (C) resulting overlay highlighting tumor boundaries for spatial attention. Please click here to view a larger version of this figure.

This overlay identified critical regions, such as tumor margins and texture patterns, providing the model with localized spatial context to improve classification accuracy, as shown in equation 3.

$Overlaid image intensity equation $I_{\text{overlaid}} = I \cdot M + I \cdot (1-M)$; mathematical concept.$

The study utilized the mask-overlay operation exclusively during the preparation of training data to supplement lesion boundary knowledge during feature learning. In contrast, the validation and inference phases employed only the original ultrasound images without masks, maintaining a conventional image-classification pipeline. Because the binary segmentation masks in the BUSI dataset represented structures already inherent in the ultrasound images, they did not introduce new class labels. To prevent information leakage, the investigation partitioned the dataset into training, validation, and testing sets, ensuring that no overlapping pairs of images or masks were shared across subsets. Consequently, the overlay functioned as a spatial attention guide that highlighted anatomically significant areas without compromising the independence of the evaluation data.

4. Data Augmentation

The study implemented targeted data augmentation to mitigate class imbalance within the training set, focusing specifically on the malignant and normal minority classes. The framework utilized a broad range of transformations to artificially increase the variability of the training data and improve the model's capacity to generalize across diverse imaging conditions.

First, the system performed horizontal flipping with a probability of 0.9 (90%), allowing the model to recognize features regardless of scanning direction. This transformation followed equation 4:

$Image flipping equation $I_{flipped} = Flip(I)$, transformation formula.$

Second, the investigation applied random rotations within a range of ±15° to simulate variations in scanning angles and patient positioning. This process forced the model to learn rotation-invariant features, such as tumor margins and texture patterns, as defined by equation 5:

Rotation equation, I_rotated = Rotate(I,θ), illustrating image transformation process.

Third, the researchers utilized color jittering to simulate variations in lighting, contrast, and color balance. The framework adjusted brightness, contrast, and saturation by a factor of 0.2, and hue by 0.1. These stochastic adjustments improved the model's robustness to appearance changes, calculated according to equation 6:

Static equilibrium formula, x_jittered=x+Δ_brightness+Δ_contrast+Δ_saturation+Δ_hue equation.

The selection of these specific hyperparameter ranges (horizontal flip 0.9, rotation +15, -15, brightness/contrast/saturation 0.2, hue 0.1) resulted from an empirical stability assessment. The investigation determined that these settings optimized the balance between diversity generation and anatomical realism, ensuring steady convergence and stable validation accuracy without inducing unrealistic lesion deformations.

5. Model Architecture

The study utilized EfficientNet-B0, a modern CNN model renowned for its efficiency, scalability, and performance compared to other image classification architectures. This model employed a method of scaling the depth, width, and resolution of the network proportionally, which allowed for high accuracy with a significantly reduced number of parameters compared to standard architectures like ResNet and VGG. This scaling approach enabled the framework to handle high-resolution images while remaining computationally efficient, fitting the specific needs of medical imaging. Figure 4 illustrates the architecture of EfficientNet-B0, highlighting the sequence of convolutional and MBConv blocks used for hierarchical feature extraction.

Neural network architecture diagram, featuring Conv 3x3 and MBConv layers, illustrating data flow.
Figure 4: EfficientNet-B0 architecture. Schematic of MBConv blocks and compound scaling factors used for hierarchical feature extraction. Please click here to view a larger version of this figure.

In order to modify EfficientNet-B0 for the breast ultrasound classification task, the researchers introduced specific adjustments to the model structure. The system swapped the initial classification layer, intended for the 1,000-class ImageNet dataset, with a fully connected (dense) layer consisting of 3 output neurons. The investigation set all layers of the pretrained EfficientNet-B0 backbone as trainable and optimized them jointly with the new classification layer, following a full fine-tuning strategy. The training process did not apply staged freezing or gradual unfreezing. The framework performed training using the Adam optimizer with a learning rate of 0.00005 and a step learning-rate scheduler that reduced the rate by a factor of 0.1 every 7 epochs. These neurons corresponded to the three target classes within the BUSI dataset: benign, normal, and malignant. The model utilized a SoftMax activation on the output layer, transforming raw scores into probability distributions, as calculated in equation 7. Table 2 shows the description of the model architecture.

Softmax function equation, P(y=i|z)=e^zi/Σ(e^zj), relevant in probabilistic models.

Layer Type	Description
Input Layer	Accepts images of size 224 x 224 x 3
Convolutional Layers	Includes initial convolution and MBConv blocks for feature extraction.
Global Average Pooling	Reduces spatial dimensions to a 1D feature vector.
Fully Connected Layer	Maps the feature vector to 3 output neurons (normal, benign, malignant).
SoftMax Activation	Converts logits into probabilities for multi-class classification.

Table 2: Shows the description of the model architecture.

The parameters used to train the model were the Adam optimizer, an initial learning rate of 0.00005, and categorical cross-entropy loss (three classes), mathematically calculated as per equation 9. ImageNet pretrained weights were used to initialize EfficientNet-B0, and fully trained, and all the backbone layers were trained and no staged freeze was applied. The initial classifier was substituted with a fully connected layer and three output neurons and finally SoftMax activation, which represented benign, malignant, and normal classes. The images were downsized to 224 x 224 pixels, and then normalized by ImageNet mean and standard deviation and loaded in mini-batches of 8. A step learning-rate scheduler decreased the learning rate by 0.1 in every 7 epochs. Validation loss was used to stop early with patience of 2 epochs in order to avoid overfitting. Grad-CAM activations were produced based on the last convolutional layer by calculating gradients, global average pooling to obtain channel weights, and upscaling the activation maps to input resolution.

In order to confirm that stability, training was repeated over several runs with Matching validation performance. Monitoring of training and validation loss curves was used to identify overfitting and a stratified hold-out test set did not depend on training was used to perform final evaluation. The difference between the validation performance and test performance is small which shows consistent convergence and optimization reproducibility. Equation 10 defines a stepwise learning rate schedule. Equation 15 signifies two consecutive increases in validation loss, indicating possible overfitting.

Equation: d=αφ, w=βφ, r=γφ; symbolic representation for research in parameter analysis.

$Cross-entropy loss formula, $L_{CE} = -\sum y_i \log(\hat{y}_i)$, mathematical equation.$

Viscosity equation ηt=η0·γ^[t/7], mathematical formula, research analysis.

Gradient descent equation diagram; shows vt update with β2, gradient, and loss function L.

Adam optimizer equations for adaptive learning rate adjustment in machine learning algorithms.

Gradient descent update equation, θ_t=θ_(t-1)-η(m_t/√(v_t+ε)), iterative optimization.

Static equilibrium equations L_val(t) ≥ L_val(t-1) in scientific formula context.

EfficientNet-B0 is particularly flexible to the task of breast ultrasonography classification as the architecture includes several innovative architectural aspects that enable the efficiency and speed of the architecture. The central part is the MBConv, which unites depthwise separable convolutions with squeeze-and-excitation modules to provide lower computational complexity without loss of accuracy. The model also supports a scaling of compounds that scales depth, width and resolution uniformly to have optimum performance in all kinds of resource constraints and the model is also task agile which is mathematically derived as in following equation 8. The model already contains weights that are trained on the ImageNet dataset and therefore it can be transferred to learn and to a significant extent one does not need to train the model on small medical datasets. This combination of high-level features, as well as the customized training environment, which also contains the changes to the final classification layer, the use of the Adam optimizer, the learning rate scheduler and the early stopping, make sure that the model is highly accurate, but at the same time, it is computationally efficient. This combination of architectural inventiveness and training approaches will enable the model to learn efficiently based on the poor medical imaging information and therefore prove a helpful instrument in the categorization of breast ultrasound.

Algorithm 1: Breast Ultrasound Image Classification Using EfficientNet-B0 and Grad-CAM

Input: Breast ultrasound images.
Output: Classification (normal, benign, malignant) and Grad-CAM heatmaps.
1. Preprocess Images:
Resize to 224×224224×224.
Normalize using ImageNet mean and std.
Augment with flipping, rotation, and color jittering.
2. Initialize Model:
Load EfficientNet-B0 (pre-trained, exclude top layer).
Add Global Average Pooling and a dense layer (3 neurons, SoftMax).
3. Train Model:
Optimizer: Adam (learning rate = 0.00005).
Loss: Cross-entropy.
Train for 18 epochs with early stopping (patience = 2).
4. Grad-CAM Visualization:
Compute gradients of the target class.
Generate and overlay heatmaps on input images.

The algorithm provides the workflow of classifying breast ultrasound images with EfficientNet-B0 and Grad-CAM, including preprocessing, training, and inference of the model, and the interpretable heatmap to validate clinically.

6. Grad-CAM as Explainable AI

A strong tool utilized in the study is able to add interpretability to the deep learning model, Gradient-weighted Class Activation Mapping. Considering a solution to the current dire need of XAI in clinical imaging, the method employs Grad-CAM to enable clinicians to intuitively understand the decision-making of the model that occurred through visually explainable heatmaps around the most significant regions in an image utilized in the prediction of the model. Such convergence of interpretability will be critical to building trust in AI systems, particularly high-stakes systems such as healthcare, where accountability and transparency are essential. Figure 5 presents Grad-CAM heatmaps highlighting the image regions that most influenced the model’s predictions for benign, malignant, and normal cases.

Diagnostic accuracy heatmap; MRI scan classification; diagram; prediction vs. actual results.
Figure 5: Grad-CAM activation heatmaps. (A) Benign, (B) Malignant, and (C) Normal cases. Warm colors (red) signify high diagnostic influence; cool colors (blue) indicate lower attention. Please click here to view a larger version of this figure.

All Grad-CAM images are shown in addition to the input image that the model was fed. Figure 5 exhibits the original ultrasound image and the model-input image to ensure that the regions of close attention are directly related to the processed input that is considered in the prediction. To obtain channel-wise importance, Grad-CAM is used to compute gradients of the predicted class using feature maps of the final convolutional layer. These gradients are averaged worldwide to get the weighting coefficients, and are then multiplied with the respective feature maps to produce a localization heatmap (equations 16 and 17). The heatmap is then upsampled to the input resolution and superimposed on the ultrasound image to indicate areas of diagnostic interest (i.e., tumor margins, changes in texture, etc.) in the image. This visualization offers a spatial understanding of the areas that are used to make the prediction in the model. This paper measures the interpretability by visual analysis of the localization of attention. They did not include quantitative measures of localization and clinician-based assessment that are still valuable avenues of future validation of clinical relevance.

Mathematical equation: double summation showing derivatives related to index notation.

Grad-CAM equation for neural network visualization, showing ReLU activation and weighted summation.

Grad-CAM gives predictions on how a model works visually by highlighting areas of an image that contributed to the prediction. The heatmaps can be used to confirm that the network concentrates on clinically relevant structures that include tumor margins and acoustic patterns. In false alarms, the visualizations can aid in noticing focus on irrelevant areas, assisting in model assessment and optimization. Such correspondence between medical features and the model's attention enhances the interpretability of AI-assisted diagnosis. Figure 6 shows the GradCAM visualization.

Breast tumor classification using ultrasound and Grad-CAM heatmaps; benign, malignant, normal.
Figure 6: GradCAM visualization. Grad-CAM heatmap showing important regions (red = high, blue = low) for classification; color intensity indicates feature importance. Please click here to view a larger version of this figure.

Grad-CAM is added to increase the interpretability of the EfficientNet-B0 model by highlighting image regions that make the maximum contributions to classification choices. The resulting heatmaps indicate clinically significant structures of tumor margins and texture changes, which facilitate clear analysis of model predictions. This high classification accuracy and visual interpretability make this proposed framework stronger in terms of reliability in analyzing the breast ultrasound images.

7. Statistical Analysis

The model is evaluated on a large set of metrics, such as precision, as a measure of the accuracy of positive predictions; recall, as a measure of the effectiveness of the model at identifying all relevant cases; and the F1 score, the harmonic mean of precision and recall, providing a balanced measure of overall performance which are mathematically calculated as per equations 18–21.

Accuracy formula, equation: (TP+TN)/(TP+TN+FP+FN); statistical performance metric.

Precision calculation, TP over TP+FP, equation, data analysis accuracy metric, research method.

Recall formula, TP/(TP+FN), equation significance in statistical data analysis.

Equation of F1-Score formula, illustrating precision and recall relationship, statistical analysis.

Mean Absolute Error (MAE) quantifies the average absolute difference between predictions and true values, mathematically calculated as per equation 22.

Mean Squared Error formula; MSE calculation method; Σ(i=1 to n) (yi - ŷi)²/n; statistical measure.

Root Mean Squared Error (RMSE) measures the average deviation of predictions from true values, mathematically calculated as per equation 23.

RMSE equation, statistical formula for root mean square error, used in predictive model evaluation.

Mean Absolute Error (MAE) quantifies the average absolute difference between predictions and true values, mathematically calculated as per equation 24.

Mean Absolute Error formula MAE for data prediction accuracy analysis, equation diagram.

Confusion matrix is utilized to provide a detailed breakdown of true positives, true negatives, false positives, and false negatives by class, providing further information on how the model classifies³⁰. These measures of evaluation in combination provide an overall assessment of the model error, strength, and generalization on unobserved data and therefore make it a reliable tool of classifying images of a breast ultrasound. The availability of lesion masks in the BUSI dataset implies that in the future, quantitative localization measures, e.g., the intersection-over-union, can be introduced to objectively assess the accuracy of the explanation. The stratified hold-out validation had been chosen in order to preserve the independence between the training and evaluation subsets retaining the class distribution. The reason for not using cross-validation was to prevent repetitive exposure of validation samples to the optimization process but in future studies repeated or nested cross-validation will be used to further determine variations in performance.

Results

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Standardization of input dimensions to 24 X 224 pixels and ImageNet-based normalization facilitated stable training behavior and smooth convergence of the loss curves. The application of the mask-overlay operation on training samples significantly increased the visibility of lesion boundaries compared to raw ultrasound images. This observation supported the hypothesis that anatomically guided enhancement improves feature separability. While the training phase utilized these overlays, the validation and test sets remained as controls without mask enhancements; the consistent performance across these sets proved that the model learned intrinsic anatomical markers rather than relying on external mask artifacts. These results confirmed that preprocessing effectively minimized background variability, particularly in visually subtle lesions.

Targeted augmentation applied to the malignant and normal minority classes resulted in a balanced class distribution and mitigated prediction bias toward the dominant benign class. The system recorded a high recall of 0.96 for the malignant class, a significant improvement over baseline control sessions where no augmentation was applied. These findings demonstrated that horizontal flipping and rotation provided orientation invariance, while color jittering enhanced resistance to imaging variations. The empirical data supported the hypothesis that class-specific augmentation is essential for maintaining high sensitivity in diagnostic categories with limited samples.

The fine-tuned EfficientNet-B0 model attained an overall classification accuracy of 99% on the BUSI dataset. Specific performance metrics included precision values of 1.00 for benign, 0.96 for malignant, and 1.00 for normal classes. The transition from ImageNet features to ultrasound texture patterns was highly efficient, requiring minimal examples for multi-class discrimination. Table 2 provides the detailed architecture parameters, while Figure 7 illustrates the classification report heatmap and final accuracy scores. The low error metrics—MSE = 0.0513, RMSE = 0.2265, and MAE = 0.0342—further validated the stability and optimization reproducibility of the proposed framework, as compared in Figure 8.

DATA AVAILABILITY:

The dataset analyzed in this study is publicly available and can be accessed through the Kaggle repository at: https://www.kaggle.com/datasets/sabahesaraki/breast-ultrasound-images-dataset.

Classification report heatmap; precision-recall metrics; benign-malignant, normal; performance analysis.
Figure 7: Classification performance report. Heatmap summarizing precision, recall, and F1-scores across diagnostic categories. Please click here to view a larger version of this figure.

Comparison of error metrics bar chart depicting MSE, RMSE, and MAE values for data analysis.
Figure 8: Statistical error metrics. Bar chart comparing MSE, RMSE, and MAE to illustrate predictive stability and model convergence. Please click here to view a larger version of this figure.

Grad-CAM visualizations identified diagnostically critical regions, such as lesion borders and internal echo patterns, which consistently aligned with the areas of interest indicated by radiologists. Figure 9 demonstrates that heatmaps were strictly focused on tumor regions rather than normal tissue, providing direct evidence for the hypothesis that the model utilized clinically significant structures for decision-making. In correctly classified cases, activation maps were concentrated on lesion margins, whereas misclassified samples exhibited dispersed attention patterns. The existing techniques have comparison studies presented in Table 3.

Study	Technique	Accuracy
Sathiyabhama Balasubramaniam et al.(2023)³¹	LeNet	89.91%
Kushangi Atrey et al.(2023)³²	Support Vector Machine (SVM)	93.41%
Nicolle Vigil et al.(2022)³³	Convolutional deep autoencoder	78.50%
Fatih Uysal & Mehmet Murat Köse et al.(2022)³⁴	ResNet50	85.4%
Alexandre Boulenger et al. (2022)³⁵	CNN	85%
Gültekin Işık & İshak Paçal (2024)³⁶	ResNet50	0.83%
Haixia Liu et al. (2022)³⁷	CNN	97.18%
Mobarak Zourhri et al.(2023)³⁸	VGG19	98.44%
Hamidreza Taleghamar et al.(2022)³⁹	DCNN	88%
Proposed Model	EfficientNetB0	99%

Table 3: Shows the comparison studies from the existing techniques.

Discussion

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The proposed framework utilizes a fine-tuned EfficientNet-B0 architecture integrated with Gradient-weighted Class Activation Mapping (Grad-CAM) to provide a high-performance, interpretable system for breast ultrasound classification. This research advances the scientific field by bridging the gap between "black-box" deep learning models and clinical transparency, demonstrating that computational efficiency does not have to be sacrificed for explainability⁴⁰. By implementing lesion-conscious preprocessing and targeted augmentation, the study provides a robust methodology for handling class imbalances inherent in medical datasets, thereby enhancing the reliability of AI-assisted diagnostics in oncology.

Despite the high accuracy achieved, the study faces several limitations. The BUSI dataset represents a single-source clinical acquisition scenario, which may limit the cross-domain robustness of the model when exposed to different scanner properties or diverse population distributions⁴¹. Furthermore, the sample size, while representative, lacks the scale of multi-center cohorts. Alternative ways of studying the research hypothesis include the implementation of Vision Transformers (ViT) or Self-Supervised Learning (SSL), which could potentially capture more global context in ultrasound textures without the need for extensive labeled masks. Additionally, quantitative localization metrics, such as the Intersection-over-Union (IoU) of Grad-CAM heatmaps against radiologist-annotated masks, could provide a more objective measure of interpretability compared to visual assessment alone⁴².

The importance and potential applications of this method are significant for clinical decision-support systems. In specific research areas like telemedicine and low-resource healthcare settings, the low computational footprint of EfficientNet-B0 allows for deployment on edge devices, aiding radiologists as a "second reader" to reduce diagnostic uncertainty and human error⁴³. Future directions will focus on validating the model using heterogeneous, multi-center datasets to ensure generalizability. Moreover, exploring federated learning frameworks could enable privacy-preserving collaborations between institutions, while the integration of multi-modal data—combining ultrasound with patient clinical history—may further refine the predictive precision of the diagnostic framework⁴⁴.

Conclusion:

This study is a great addition to medical imaging where the proposal of a deep learning model to classify breast ultrasound images using EfficientNet-B0 model and Grad-CAM is proposed to achieve high accuracy but interpretability. The proposed model has a high accuracy of 99, high precision, recall, and F1 values of all the diagnostic classes, indicating the stability and robustness of the model in classifying the breast ultrasound images. Grad-CAM integration offers clear visual representations of model predictions, and the focus on areas of clinical significance, including tumor margins and texture patterns, according to medical knowledge and establishing trust in the AI system. The results highlight the advantage of using state-of-the-art performance with Explainable AI (XAI) to develop tools that are both accurate and interpretable and actionable to clinicians. Subsequent studies will require expansion of the data set by adding multi-centre and heterogeneous sets of patients in order to make the model as generalizable as possible to different clinical environments.

To verify clinical-grade generalizability, external validation on other datasets of other institutions, imaging equipment, and even patient populations is required. The reported performance indicates competence in the BUSI distribution and larger validation is necessary before actual deployment can take place. The value and utility of the model in the real healthcare settings will require real-time testing and implementation in clinical settings within the hospitals. Such approach may revolutionize the diagnosis of breast cancer, a second opinion radiologist with the minimum variability in diagnosis, and better patient care, in addressing such problems. The research has created a new standard in the field of AI-powered diagnosis platforms and preconditions the automatic introduction of the latter into clinical routine and encourages the use of AI in medicine even further. The paper is concluded with the fact that the combination of EfficientNet-B0 transfer learning with mask-guided preprocessing and Grad-CAM interpretability allows to say with confidence that breast ultrasound images can be classified correctly and at the same time the reasoning of the decision is clear. The results of the alignment of the areas of activation and clinically significant structures substantiate the possibility of applying the explainable deep learning models as assistive diagnostic tools in the medical imaging processes.

Disclosures

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors declare that they have no conflicts of interest.

Acknowledgements

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors would like to express their sincere gratitude to their respective institutions and colleagues for their continuous support and encouragement throughout this research. The authors also acknowledge the creators of the BUSI Breast Ultrasound Dataset for providing the publicly available dataset that enabled this study. This work was supported by Princess Nourahbint Abdulrahman University Researchers Supporting Project number (PNURSP2026R432), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through Large Group Research under grant number RGP2/749/46.

Materials

List of materials used in this article
Name	Company	Catalog Number	Comments
Breast Ultrasound Images (BUSI Dataset)		N/A	780 PNG images (487 Benign, 210 Malignant, 133 Normal)
Matplotlib	Matplotlib Developers	N/A	Version 3.7.1
NumPy	NumPy Developers	N/A	Version 1.23.5
NVIDIA Tesla T4 GPU	NVIDIA	N/A	16GB VRAM
OpenCV	OpenCV.org	N/A	Version 4.7.0
Pillow (PIL)	Python Imaging Library	N/A	Version 9.4.0
Python	Python Software Foundation	N/A	Version 3.9.12
PyTorch	Meta AI	N/A	Version 2.0.1+cu117
Scikit-learn	Scikit-learn Developers	N/A	Version 1.2.2
Seaborn	Seaborn Developers	N/A	Version 0.12.2
Ubuntu OS	Canonical	N/A	64-bit

References

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Wilkinson, L., Gathani, T. Understanding breast cancer as a global health concern. Br J Radiol. 95 (1130), 20211033(2021).
Ud din, N. M., Dar, R. A., Rasool, M., Assad, A. Breast cancer detection using deep learning: datasets, methods, and challenges ahead. Comput Biol Med. 149, 106073(2022).
Jin, W., Li, X., Fatehi, M., Hamarneh, G. Guidelines and evaluation of clinical explainable AI in medical image analysis. Med Image Anal. 84, 102684(2023).
Ghasemi, A., Hashtarkhani, S., Schwartz, D. L., Shaban-Nejad, A. Explainable artificial intelligence in breast cancer detection and risk prediction: a systematic scoping review. Cancer Innov. 3 (5), (2024).
Sharafaddini, A. M., Esfahani, K. K., Mansouri, N. Deep learning approaches to detect breast cancer: a comprehensive review. Multimedia tools and applications. 84 (21), 24079-24190 (2025).
Singh, V. K., et al. Breast tumor segmentation in ultrasound images using contextual-information-aware deep adversarial learning framework. Expert Systems with Applications. 162, 113870(2020).
Wani, N. A., Kumar, R., Bedi, J. Harnessing fusion modeling for enhanced breast cancer classification through interpretable artificial intelligence and in-depth explanations. Engineering Applications of Artificial Intelligence. 136, 108939(2024).
Kikani, K., Desai, M., Desai, B. Early Detection and Personalized Risk Assessment of Breast Cancer: An Integrative Review and Comparative Evaluation of AI Models Using Clinical and Imaging Data. Archives of Computational Methods in Engineering. , 1-29 (2025).
Singh, R., Gupta, S., Yamsani, N., Manwal, M., Bansal, G. Automated Breast Cancer Classification using a Custom CNN Architecture on the BUSI Dataset. IEEE Xplore. , 1185-1190 (2025).
Ali, A., et al. Exploring AI approaches for breast cancer detection and diagnosis: A review Article. Breast Cancer: Targets and Therapy. , 927-947 (2025).
Sharafaddini, A. M., Esfahani, K. K., Mansouri, N. Deep learning approaches to detect breast cancer: a comprehensive review. Multimed Tools Appl. , (2024).
Sienicki, K. Comment on the paper titled ‘The origin of quantum mechanical statistics: insights from research on human language’. Preprints. , (2024).
Ahn, J. S., et al. Artificial intelligence in breast cancer diagnosis and personalized medicine. Journal of breast cancer. 26 (5), 405(2023).
Qureshi, S. A., et al. Breast cancer detection using mammography: image processing to deep learning. IEEE Access. , (2024).
Dihmani, H., Bousselham, A., Bouattane, O. A new computer-aided diagnosis system for breast cancer detection from thermograms using metaheuristic algorithms and explainable AI. Algorithms. 17 (10), 462(2024).
Nakach, F. Z., Idri, A., Goceri, E. A comprehensive investigation of multimodal deep learning fusion strategies for breast cancer classification. Artif Intell Rev. 57 (12), 327(2024).
Mondol, R. K. Deep learning based prognosis and explainability for breast cancer. , UNSW Sydney. Doctoral dissertation (2025).
Ayana, G., Ryu, J., Choe, S. Ultrasound-responsive nanocarriers for breast cancer chemotherapy. Micromachines. 13 (9), 1508(2022).
Afrin, H., Larson, N. B., Fatemi, M., Alizad, A. Deep Learning in Different Ultrasound Methods for Breast Cancer, from Diagnosis to Prognosis: Current Trends, Challenges, and an Analysis. Cancers. 15 (12), 3139(2023).
Ayana, G., Ryu, J., Choe, S. Ultrasound-responsive nanocarriers for breast cancer chemotherapy. Micromachines. 13 (9), 1508(2022).
Di Paola, V., et al. Beyond N staging in breast cancer: importance of MRI and ultrasound-based imaging. Cancers. 14 (17), 4270(2022).
Sahu, A., Das, P. K., Meher, S. Efficient deep learning scheme for breast cancer detection using mammogram and ultrasound images. Biomed Signal Process Control. 87, 105377(2024).
Rezazadeh, A., Jafarian, Y., Kord, A. Explainable ensemble machine learning for breast cancer diagnosis using ultrasound texture features. Forecasting. 4 (1), 262-274 (2022).
Raza, A., et al. Deep Breast Cancer Net: a novel deep learning model for ultrasound breast cancer detection. Appl Sci. 13 (4), 2082(2023).
Jabeen, K., et al. Breast cancer classification from ultrasound images using optimal deep learning feature fusion. Sensors. 22 (3), 807(2022).
Ayana, G., Park, J., Jeong, J. W., Choe, S. Multistage transfer learning for ultrasound breast cancer image classification. Diagnostics. 12 (1), 135(2022).
Gupta, S., Agrawal, S., Singh, S. K., Kumar, S. Transfer learning-based model for ultrasound breast cancer classification. Comput Vis Bio-Inspired Comput. , 511-523 (2023).
Kratkiewicz, K., Pattyn, A., Alijabbari, N., Mehrmohammadi, M. Ultrasound and photoacoustic imaging of breast cancer: clinical systems and challenges. J Clin Med. 11 (5), 1165(2022).
Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A. Dataset of breast ultrasound images. Data Brief. 28, 104863(2020).
R, M. T. Enhancing Diagnostic Accuracy in Breast Ultrasound Imaging through Deep Learning and Image Augmentation Techniques. IEEE ICERCS, , 1-6 (2024).
Balasubramaniam, S., Velmurugan, Y., Jaganathan, D., Dhanasekaran, S. A Modified LeNet CNN for Breast Cancer Diagnosis in Ultrasound Images. Diagnostics. 13 (17), 2746(2023).
Atrey, K., Singh, B. K., Bodhey, N. K. Multimodal classification of breast cancer using feature level fusion of mammogram and ultrasound images in machine learning paradigm. Multimedia Tools and Applications. 83 (7), 21347-21368 (2023).
Vigil, N., et al. Dual-Intended Deep Learning Model for Breast Cancer Diagnosis in Ultrasound Imaging. Cancers. 14 (11), 2663(2022).
Uysal, F., Köse, M. M. Classification of Breast Cancer Ultrasound Images with Deep Learning-Based Models. ASEC 2022, , 8(2022).
Boulenger, A., et al. Deep learning-based system for automatic prediction of triple-negative breast cancer from ultrasound images. Medical & Biological Engineering & Computing. 61 (2), 567-578 (2022).
Işık, G., Paçal, İ Few-shot classification of ultrasound breast cancer images using meta-learning algorithms. Neural Computing and Applications. 36 (20), 12047-12059 (2024).
Liu, H., et al. Artificial Intelligence-Based Breast Cancer Diagnosis Using Ultrasound Images and Grid-Based Deep Feature Generator. International Journal of General Medicine. 15, 2271-2282 (2022).
Zourhri, M., et al. Deep Learning Technique for Classification of Breast Cancer using Ultrasound Images. IEEE IRASET, , (2023).
Taleghamar, H., Jalalifar, S. A., Czarnota, G. J., Sadeghi-Naini, A. Deep learning of quantitative ultrasound multi-parametric images at pre-treatment to predict breast cancer response to chemotherapy. Scientific Reports. 12 (1), (2022).
Sharafaddini, A. M., Esfahani, K. K., Mansouri, N. Deep learning approaches to detect breast cancer: a comprehensive review. Multimed Tools Appl. , (2024).
Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A. Dataset of breast ultrasound images. Data Brief. 28, 104863(2020).
Mondol, S. S., Hasan, M. K. Enhancing B-mode-based breast cancer diagnosis via cross-attention fusion of H-scan and Nakagami imaging with multi-CAM-QUS-Driven XAI. Physics in Medicine & Biology. 70 (17), 175011(2025).
Shifa, N. Explainable breast cancer detection in mammograms using lightweight EfficientNet-B0 with Grad-CAM and LIME. , Qatar University. (2025).
Haghighat, F., Nemati, Z., Rambodrad, A., Negareshifard, P., Jafari, E. Multimodal deep learning and data fusion in precision breast oncology: clinical Applications, fusion Strategies, and future directions. InfoScience Trends. 2 (10), 81-115 (2025).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

An Explainable AI-Based Transfer Learning Method for Breast Cancer Prediction

In This Article

Summary

Abstract

Introduction

Protocol

Results

Discussion

Disclosures

Acknowledgements

Materials

References

Reprints and Permissions

Tags

Related Articles