RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
A subscription to JoVE is required to view this content. Sign in or start your free trial.
Research Article
Pai H. Aditya1, T. R. Mahesh2, J. V. Muruga Lal Jeyan1, Surbhi Bhatia Khan3,4, Shakila Basheer5, Ali Algarni6,7
1Department of CSE, MIT School of Computing,MIT Art, Design and Technology University, 2Department of Computer Science & Engineering, Faculty of Engineering and Technology,JAIN (Deemed-to-be University), 3School of Science, Engineering and Environment,University of Salford, 4Centre for Research Impact and Outcome,Chitkara University, 5Department of Information Systems, College of Computer and Information Science,Princess Nourah bint Abdulrahman University, 6Department of Informatics and Computer Systems, College of Computer Science,King Khalid University, 7Center for Artificial Intelligence,King Khalid University
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
Here, we introduce a deep learning system with the EfficientNetB7 model for the precise classification of lung and colon cancer histopathological images. The model gained 96% accuracy with the application of preprocessing, data augmentation, and transfer learning. The method has a high prospect for aiding clinical cancer diagnosis.
Early diagnosis of lung cancer plays a pivotal role in ensuring improved treatment and survival of patients. This remains a major focus in clinical research. Artificial intelligence (AI) has transformed pathology by significantly improving diagnostic accuracy and efficiency. This study presents a robust deep learning model in the shape of the pretrained EfficientNetB7 model to classify colon and lung tissue histopathological images with an extremely high accuracy of 96%. The model's performance was optimized using advanced preprocessing methods, fine-tuning, and domain-specific data augmentation techniques. These strategies help reduce problems such as class imbalance and subtle histological variations. To address the issue of overfitting, multiple data augmentation techniques were combined, and an early stopping criterion was incorporated. This approach enabled efficient and cost-effective training. Robust validation of the model demonstrates high utility for clinical applications and enables pathologists to deliver timely and accurate diagnoses. Integrating advanced deep learning models into medical imaging workflows holds great promise for early and accurate cancer diagnosis, ultimately improving patient outcomes.
Lung and colon cancer are among the most prevalent cancers in the world in terms of mortality. Lung cancer is the leading fatal cancer with over 1.8 million deaths annually, followed by colon cancer as the third most occurring malignancy and the second most common cause of cancer mortality, based on global health statistics. Accurate and early diagnosis is crucial for effective treatment and improved survival of these cancers. Histopathological examination, or microscopic evaluation of tissue samples by pathologists, remains one of the most frequent methods of detecting cancer1. Figure 1 shows the sample histopathological images of several types of lungs and colon tissues2.

Figure 1: Sample images from the dataset. This figure shows representative examples from each class in the LC25000 dataset, highlighting the visual diversity among benign and malignant lung and colon tissue images. Please click here to view a larger version of this figure.
Digital pathology has transformed the industry by enabling it to digitize histology slides, which are now diagnosed using sophisticated deep learning algorithms3. In combination with deep learning models, algorithms are great at recognizing subtle patterns within large data, significantly boosting the accuracy and efficiency of diagnosis. Table 1 shows the description of each tissue class.
| Class Name | Description | |
| Lung Benign Tissue | Non-cancerous tissue in the lungs, not involved in cancerous growth. | |
| Lung Adenocarcinoma | A type of lung cancer arising from glandular epithelial tissue, known for its malignant growth patterns. | |
| Lung Squamous Cell Carcinoma | A kind of lung cancer distinguished by a certain cellular morphology that arises from the squamous cells lining the airways. | |
| Colon Adenocarcinoma | Most occurrences of colon cancer are caused by a common kind of colon cancer that begins in the glandular cells of the colon lining. | |
| Colon Benign Tissue | Healthy tissue in the colon that does not display any signs of cancerous or pre-cancerous conditions. | |
Table 1: An overview of the five tissue and cancer classes in the dataset, including class names, characteristics, and the number of images in each class.
EfficientNetB7, a cutting-edge convolutional neural network architecture, has been identified as particularly effective for image classification tasks due to its balanced scaling in depth, width, and resolution. Compared to widely used architectures such as ResNet and DenseNet, EfficientNetB7 achieves a superior balance between classification accuracy and computational efficiency through compound scaling, which proportionally scales network depth, width, and resolution. While ResNet and DenseNet have demonstrated robust performance in medical image classification tasks, they typically require a higher parameter count and greater computational resources to reach comparable accuracy, making EfficientNetB7 more suitable for large-scale histopathological analysis in resource-constrained clinical settings4. The application of advanced computational models, including AI tools such as EfficientNetB7, offers new possibilities for overcoming traditional diagnostic challenges, providing tools that support pathologists and potentially improve diagnostic outcomes. Integrating AI into pathology practice could democratize access to quality diagnostic services, making it feasible for regions with limited medical expertise to perform advanced diagnostics.
Recent benchmark studies and comprehensive reviews in medical image analysis have consistently emphasized the growing impact of deep learning architectures in histopathology5. Large-scale analyses have shown that convolutional neural networks and their modern variants provide state-of-the-art performance in identifying subtle morphological features across multiple tissue types, often surpassing traditional diagnostic methods. Comparative evaluations highlight that advanced architectures, including EfficientNet and other scalable models, not only achieve superior classification accuracy but also demonstrate improved computational efficiency6. In addition, systematic reviews of digital pathology underscore its transformative role in routine diagnostics by enabling reproducibility, reducing inter-observer variability, and allowing integration with automated pipelines. These collective findings affirm the relevance of adopting advanced deep learning models for histopathological cancer classification and provide strong evidence that the application of scalable and efficient architectures can address clinical challenges such as class imbalance, histological variability, and limited availability of expert pathologists.
The objectives of this research are (i) creating and assessing a deep learning model for the categorization of lung and colon cancer histopathology images based on the EfficientNetB7 architecture; (ii) Demonstrating the model's ability to achieve high classification accuracy, potentially surpassing traditional diagnostic methods; (iii) Exploring the impact of data augmentation, preprocessing techniques, and transfer learning on model performance in a domain-specific context.
This study not only demonstrates the high diagnostic accuracy achievable with EfficientNetB7 on histopathological images but also rigorously addresses key challenges such as class imbalance and overfitting through targeted data augmentation, robust validation strategies, and early stopping techniques. The methodology is specifically designed to reflect real-world clinical settings by utilizing large-scale, augmented datasets and benchmarking the model's performance on clinically relevant metrics. The workflow's automation and efficiency ensure that the system is practical for integration into digital pathology labs, offering the potential to enhance pathologist productivity, improve diagnostic consistency, and enable broader access to expert-level analysis in both high-resource and resource-limited clinical environments.
Even with remarkable progress in deep learning for medical imaging, several challenges remain when classifying lung and colon cancer histopathological images7. Current models tend to fail in dealing with class imbalance, wherein some types of cancer can be underrepresented within datasets and create biased predictions. Further, subtle histological differences between subtypes of cancer, like adenocarcinoma and squamous cell carcinoma of the lung, present a formidable challenge to proper classification. Most previous work has been carried out on single-cancer classification or with limited small datasets, precluding wide applicability and utility in a clinical setting. Traditional approaches are also less computationally efficient, hence less desirable for large data or real-time clinical applications8.
The present study bridges these gaps by leveraging the EfficientNetB7 architecture, which is a state-of-the-art deep learning model with proven scalability and effectiveness. Random zooms, flips, rotations, and shifts are utilized as advanced data augmentation methods to maximize the model's robustness and generalization capability across different histopathological patterns. Transfer learning is also used to fine-tune the pre-trained EfficientNetB7 model for it to be capable of learning and adapting in a way specific to lung and colon cancer datasets and minimizing training time and computational cost9. The use of global average pooling and early stopping also improves the performance of the model in having good training and avoiding overfitting.
Application of deep learning in medical imaging, especially for the diagnosis of cancer, has made huge progress over the last few years. The conventional diagnosis of lung and colon cancer depends mainly on histopathological examination, wherein pathologists inspect tissue samples under a microscope10. While effective, it is labor-intensive, open to inter-observer variability, and usually restricted by the limited supply of expert pathologists, especially in resource-challenged settings. These limitations have led to the development of computerized AI- and deep learning-based diagnostic systems.
Deep learning models, particularly convolutional neural networks (CNNs), have been impressively successful at image analysis of histopathological specimens11. They are especially good at identifying intricate patterns and subtle tissue morphology variations, tasks that are suitable for cancer classification. Despite the progress, there remain deficiencies such as class imbalance, limited generalization over multi-dataset scenarios, and high computational costs in the current approaches. For instance, most of the work addresses single-cancer classification, which limits its use to multi-cancer diagnosis scenarios. Also, poor preprocessing and data augmentation techniques render it susceptible to overfitting, particularly under small or imbalanced dataset scenarios.
Recent developments in transfer learning have partially overcome these limitations by capitalizing on pretrained models on big-scale datasets such as ImageNet. EfficientNet, ResNet, and DenseNet are some models that have attracted attention in medical image analysis as they can learn high-level features from limited amounts of training data12. EfficientNet has also seen widespread adoption for its scalability and efficiency since it achieves state-of-the-art accuracy on several image classification benchmarks. EfficientNet's application in classifying combined lung and colon cancer is not extensively investigated, especially concerning handling class imbalance and subtle histological variations13.
Data augmentation has been a crucial strategy for model robustness and generalization improvement14. Through incorporating variations such as rotation, shifting, zooming, and flipping, data augmentation techniques simulate variability that exists in actual histopathological images, enabling models to learn invariant features. Early stopping and regularization techniques further improve model performance by avoiding overfitting and maximizing training efficiency. Despite all these advancements, there is still a need for end-to-end frameworks that embrace advanced preprocessing, augmentation, and transfer learning to achieve high accuracy and computational efficiency for multi-cancer classification. Table 215,16,17,18,19,20,21,22,23,24 shows the literature of the study from the existing literature.
| Study | Objective | ||
| Talukder, M. A. et al.15 (2022) | Present a hybrid ensemble feature extraction model combining deep learning and machine learning to identify colon and lung cancer. | ||
| Attallah, O. et al.16 (2022) | Provide a low-weight deep learning framework that combines a variety of models and transformation techniques to help detect lung and colon cancers early on. | ||
| Hage Chehade, A. et al.17 (2022) | Create a machine learning-based computer-aided diagnostic system that can identify different kinds of lung and colon tissues. | ||
| Wahid, R. R. et al.18 (2023) | Use a computer-aided diagnosing system with CNNs to detect lung and colon cancers. | ||
| Kumar, N. et al.19 (2022) | Compare and contrast the feature extraction techniques used to categorise colon and lung cancer. | ||
| Mehmood, S. et al.20 (2022) | Make a pretrained neural network with modified layers diagnosis model for lung and colon cancers that is both accurate and efficient. | ||
| Zhou, L. et al.21 (2024) | Predict new targets for Wogonin (WOG) in treating lung, bladder, and colon cancer using bioinformatics methods. | ||
| Reddy, K. R. et al.22 (2022) | Provide a hybrid ensemble feature extraction method based on machine learning for the colon cancer (LCC) detection. | ||
| Shandilya, S., Nayak, S. R.23 (2022) | CNNs and vision transformer design help to classify lung and colon cancer histopathology images. | ||
| Hasan, M. et al.24 (2023) | Using many CNN models, classify images of lung and colon cancer to enhance diagnosis procedures. | ||
Table 2: A comparison of relevant previous studies on histopathological image classification, detailing datasets used, model types, and reported outcomes.
This paper builds on these contributions by incorporating a deep learning model utilizing the EfficientNetB7 architecture, complemented by state-of-the-art data augmentation and transfer learning. The current solution outperforms the flaws of previous solutions through enhanced classification accuracy, class equilibrium, and lower computational cost, and hence it is an optimal solution for usage in the clinical setting.
This study did not involve any direct experimentation on human participants or animals. All work was conducted using the publicly available, anonymized LC25000 dataset of histopathological images, which contained no identifiable patient information or direct handling of human tissue. Institutional Review Board (IRB) or Institutional Animal Care and Use Committee (IACUC) approval was not required. All procedures complied with ethical standards and adhered to the dataset's terms of use for academic research. Figure 2 shows the steps of the workflow diagram.

Figure 2: Workflow of the proposed method. The workflow includes data preprocessing, augmentation, model training, and evaluation. Please click here to view a larger version of this figure.
Dataset description
The LC25000 dataset was utilized for this study. It consisted of 25,000 histopathological images, all uniformly formatted as JPEG files with a resolution of 768 × 768 pixels. The dataset originated from an initial set of 1,250 original images, including 250 benign lung tissue images, 250 lung adenocarcinomas, 250 lung squamous cell carcinomas, 250 benign colon tissue images, and 250 colon adenocarcinomas. The remaining images were generated through data augmentation to expand the dataset, ensuring a diverse and extensive collection for robust model training and validation.
The high-resolution images allowed for detailed examination of cellular structures, which was essential for accurate cancer classification. Data augmentation was necessary due to the limited number of original images; techniques such as rotations, flips, zooms, and shifts were employed to synthetically increase the dataset size and diversity. This process generated an additional 23,750 synthetic images in a class-balanced manner, resulting in a final dataset of 25,000 images, with each class containing 5,000 images post-augmentation.
Data preprocessing
The raw histopathological images within the dataset had a resolution of 768 × 768 pixels. To ensure compatibility with the EfficientNetB7 architecture and to maximize computational efficiency, each image was resized from 768 × 768 to 224 × 224 pixels. This resizing process represented a trade-off between preserving essential histological features and reducing computational overhead. The resizing operation was mathematically represented as shown in
Equation 1:
(1)
where X denoted the original image, Xresized was the resized image, and h and w were the desired height and width, respectively. All pixel values were normalized to a range of 0 to 1, which stabilized and accelerated training by ensuring a consistent input distribution across images. This normalization was mathematically calculated as shown in Equation 2. This step was essential to improve model convergence during training.
(2)
Data augmentation was employed as a critical technique to enhance the robustness and applicability of deep learning models, especially in the context of medical imaging with limited and imbalanced datasets. In this study, a variety of augmentation techniques were applied using the ImageDataGenerator module from TensorFlow to simulate in vivo variation in histopathological images. These techniques included random rotations of up to 20 degrees to simulate tissue orientation variability, horizontal and vertical shifts up to 20% of the image dimensions to mimic position variability during slide preparation, and random zooms up to 20% to reflect variability in magnification levels.
Additionally, horizontal flipping was performed to introduce mirror-image variations, thereby further enriching the training dataset. To manage newly synthesized pixels generated during these transformations, a nearest-neighbor filling strategy was adopted to ensure smooth transitions and preserve the integrity of key histological features. The 2D rotation matrix used for augmentation is represented in Equation 3, and Equation 4 defines a function with an incremental change:
(3)
(4)
These augmentation techniques compensated for challenges such as class imbalance, slight histological variations, and dataset limitations by enabling the model to learn invariant features and reducing the risk of overfitting. By synthetically expanding the dataset and incorporating controlled variability, data augmentation significantly enhanced the model's ability to generalize to new, unseen data, thereby producing a stronger and more robust system for clinical deployment in lung and colon cancer diagnosis.
Model architecture
The model was developed based on the EfficientNetB7 architecture, a state-of-the-art deep learning network renowned for its performance, scalability, and effectiveness in image classification tasks. EfficientNetB7 was constructed using a series of Mobile Inverted Bottleneck Convolution (MBConv) blocks, which incorporated depthwise separable convolutions. This structural design effectively compressed the model, resulting in significant reductions in the number of parameters with minimal loss in performance. Such innovation enabled the model to achieve high accuracy with lower computational costs, making it particularly suitable for computationally intensive applications such as histopathological image analysis. Figure 3 shows the model architecture of the pretrained EfficientNetB7.

Figure 3: Model architecture of EfficientNetB7. This figure shows the detailed structure of the EfficientNetB7 model, illustrating its main layers and functional blocks used for image classification. Please click here to view a larger version of this figure.
The model accepted preprocessed histopathological images that had been resized from their original dimensions of 768 × 768 pixels down to 224 × 224 pixels to match the input requirements of EfficientNetB7. Each image underwent several processes within the network, including batch normalization, Swish activation functions, and convolution operations. Batch normalization was calculated as shown in Equation 5. The Swish activation function introduced non-linearity and enabled faster, more efficient training compared to traditional functions like ReLU. These layers were designed to extract and enhance hierarchical features, allowing the model to learn progressively from low-level to high-level patterns as the images propagated through the network.
(5)
The convolutional backbone of EfficientNetB7, pretrained on the ImageNet dataset, provided a highly generalizable set of feature detectors effective across a broad range of image domains. The pretrained base played a key role in capturing subtle yet informative patterns in histopathological images, which are often characterized by high intra-class variability and fine-grained features. The output from the convolutional layers was passed to a global average pooling layer, which reduced the spatial dimensions to a single feature vector for each image. This vector, encapsulating the most relevant features learned by the convolutional layers, was then flattened into a one-dimensional array for compatibility with fully connected layers. The algorithm for enhanced detection of lung and colon cancer is provided in Supplementary File 1.
The subsequent network architecture included two dense (fully connected) layers with 128 and 64 units, respectively. Both layers utilized the Rectified Linear Unit (ReLU) activation function to introduce additional non-linearity and to fine-tune the extracted features, enabling the model to recognize more precise patterns and relationships within the data. The final layer of the model was a SoftMax output layer with five neurons, each representing one of the five classes in the dataset. The SoftMax function produced a probability distribution across the classes, which reflected the model's prediction for each input image, as described in Equation 6:
(6)
The combination of state-of-the-art convolutional techniques, robust regularization, and effective optimization strategies resulted in a deep learning model well-suited for high-stakes medical image classification. The model achieved precision, scalability, and operational efficiency, demonstrating strong potential for clinical application in lung and colon cancer diagnosis. Within this framework, regularization was primarily accomplished through advanced data augmentation and the use of early stopping; no explicit dropout or weight decay was applied to the dense layers.
Training and validation
The validation and pretraining pipeline of the convolutional neural network (CNN) for LC25000 histopathological image classification was designed to optimize model performance and ensure good generalization. The pipeline began with comprehensive preprocessing, which included reducing the original 768 × 768-pixel images to 224 × 224 pixels to match the input requirements of the EfficientNetB7 architecture. Pixel intensities were normalized to the range [0, 1] to stabilize and accelerate training, and data augmentation techniques, including random rotations, shifts, zooms, and horizontal flips -- were applied to introduce variability and mitigate overfitting. These preprocessing operations ensured that the model learned invariant features and generalized well to new data.
The EfficientNetB7 model, pretrained on the ImageNet dataset, was employed as the base, providing a robust feature detector set capable of identifying subtle histological patterns. The model was subsequently fine-tuned with two dense layers (128 and 64 units) activated by the Rectified Linear Unit (ReLU) function, followed by a SoftMax output layer for five-class classification. This architecture was optimized for both precision and computational efficiency, making it well-suited for medical image analysis. The Adam optimizer update rule for θ is shown in Equation 7.
(7)
Model training was performed in batches of 128 images using the Adam optimizer, which adaptively scaled the learning rate to promote effective convergence. An early stopping criterion was implemented to monitor validation loss and halt training if no improvement was observed, thus preventing overfitting and optimizing computational resources. Throughout training, key performance metrics -- including accuracy, loss, precision, and recall -- were tracked using a hold-out validation set comprising 20% of the data. The hold-out set provided critical insights into the model's generalization capabilities and mitigated the risk of overfitting to unseen data.
The integration of advanced preprocessing, data augmentation, and optimization strategies resulted in excellent model accuracy and robustness, demonstrating strong potential for clinical application in lung and colon cancer detection. All experiments were conducted on an NVIDIA Tesla P100 GPU with 16 GB memory and 64 GB system RAM. The full training pipeline -- including preprocessing, augmentation, and model optimization -- required approximately three hours to complete ten epochs with a batch size of 128. This computational infrastructure was selected to efficiently manage the large, augmented dataset while maintaining reasonable training times appropriate for clinical research environments.
For reproducibility, all critical hyperparameters were explicitly specified. The model was trained using the Adam optimizer with a constant learning rate of 0.001. Training was carried out for a maximum of 10 epochs, with early stopping based on validation loss and a patience of three epochs to prevent overfitting. A batch size of 128 was utilized for both training and validation phases. To ensure consistent results, the random seed was set to 42 at all stages of data loading and model training. No learning rate decay schedule was applied during training.
The protocol combined systematic preprocessing, well-calibrated hyperparameters, rigorous validation schemes, and an open computational environment to promote both accuracy and reproducibility. The workflow, by integrating robust deep learning architecture, effective optimization, and reproducible experimental design, established a solid foundation for histopathological image classification. In addition to demonstrating strong performance on lung and colon cancer detection, the protocol defined a scalable framework that could be extended to similar biomedical imaging challenges in future research.
During model development, several practical adjustments were implemented to achieve reliable and reproducible results. Early stopping and advanced data augmentation were used to address class imbalance and overfitting. High memory usage during training was mitigated by resizing images to 224 × 224 pixels and using a batch size of 128. When slow convergence was encountered, the learning rate was empirically set to 0.001. Manual verification of the image directory structure was performed to prevent data loading errors, and random seeds were fixed to ensure reproducibility. These strategies can serve as practical guidance for researchers facing analogous challenges in training deep learning models with large histopathological image datasets.
Statistical analysis
Statistical modeling of the model's performance was conducted using a comprehensive set of metrics to assess accuracy, stability, and generalizability. Key evaluation measures included precision, recall, F1-score, and accuracy to quantify classification performance. The confusion matrix was utilized to report class-specific error metrics, calculated according to Equations 8-10:
(8)
(9)
(10)
Additionally, measures of error such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) were evaluated to assess prediction accuracy, calculated using Equations 11-13:
(11)
(12)
(13)
The Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) were employed to evaluate the model's ability to discriminate between classes at various thresholds. The AUC was calculated as shown in Equation 14:
(14)
These statistical analyses provided a thorough assessment of the model's classification performance, robustness, and generalization to unseen data.
Ablation study
To further assess the contribution of each component within the classification framework, an ablation study was conducted by selectively omitting or modifying specific architectural elements of the final model. Specifically, two alternative experimental configurations were evaluated: one configuration included both the Global Average Pooling and Flatten layers, followed by two dense layers (128 and 64 units, respectively); the other configuration omitted both the Flatten layer and the 128-unit dense layer, leaving only the Global Average Pooling layer and a single 64-unit dense layer.
The complete architecture, as reported in the main results, achieved a validation accuracy of approximately 96%. When the Flatten layer was excluded, and only the 128- and 64-unit dense layers were used after Global Average Pooling, the validation accuracy dropped to 81.6%. Conversely, when the 128-unit dense layer was excluded, and the Flatten layer was retained, the validation accuracy decreased to 88%. These results indicated the necessity of including both the extra dense layer and the appropriate architectural components to achieve optimal classification performance. The findings from the ablation experiments consistently demonstrated that the combination of Global Average Pooling, Flattening, and two fully connected layers enabled a more expressive feature representation, resulting in more robust and accurate histopathological image classification.
Figure 4 presents the training and validation Accuracy. Figure 5 presents the training and validation Loss.

Figure 4: Training and validation accuracy over epochs. This figure depicts the progression of accuracy for both training and validation sets across all epochs, demonstrating how model performance evolves during training. Please click here to view a larger version of this figure.

Figure 5: Loss curves. Loss curves for both training and validation data throughout the training process, helping to visualize convergence and potential overfitting. Please click here to view a larger version of this figure.
Performance of the model was tested through an extensive suite of metrics to test its accuracy, stability, and generalization potential. Accuracy, as the ratio of correct predictions against total predictions, measured 96% over 5,000 samples. Precision, which is a measure of correct positive predictions, and Recall (Sensitivity), which measures the ability of the model to classify all actual positive cases, were computed for all classes. The F1-score, which is a balance between precision and recall, gave a single measure to assess the overall performance of the model. The Confusion Matrix was also employed to plot class-specific mistakes, and the AUC-ROC Curve was employed to measure the model's capacity to distinguish between classes at different thresholds. Figure 6 shows the confusion matrix.

Figure 6: Confusion matrix. Confusion matrix for the classification results, summarizing the proportion of correct and incorrect predictions for each class. Please click here to view a larger version of this figure.
Error measures, such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), were calculated to determine prediction accuracy as 0.01098, 0.1048, and 0.02252, respectively. These low error rates reflect the reliability and consistency of the model. Figure 7 shows the error metrics.

Figure 7: Precision, recall, and F1-score. Error metrics such as precision, recall, and F1-score for each class, providing a comparative view of model performance across different categories. Please click here to view a larger version of this figure.
Class-specific performance measures further indicate the efficiency of the model. For Class 0 (Lung benign tissue), the model performed with perfect precision (1.00) and good recall (0.98) and thereby got an F1-score of 0.99. Class 1 (Lung adenocarcinoma) was also excellent in terms of precision, recall, and F1-score all of which were 0.99. Class 2 (Lung squamous cell carcinoma) recorded precision of 0.86, recall of 0.99, and F1-score of 0.92, while Class 3 (Colon adenocarcinoma) performed perfectly with precision, recall, and F1-score all equal to 1.00. For Class 4 (Colon benign tissue), precision was 0.99, recall was 0.85, and F1-score was 0.92. The macro-average and weighted-average precision, recall, and F1-scores were around 0.97, 0.96, and 0.96, respectively, emphasizing the balanced performance of the model for all classes. Plots of the classification report (Figure 8) give an intuitive and straightforward visualization of the performance of the model, making it easier to further refine and analyze.

Figure 8: Heatmap of the classification report. This figure offers a heatmap representation of the classification report, highlighting the strengths and weaknesses in the model's predictions for each class. Please click here to view a larger version of this figure.
While classical machine learning (ML) approaches utilizing handcrafted feature extraction offer advantages in interpretability and reproducibility, they are often limited in their ability to capture the complex and subtle patterns present in histopathological images. Deep learning models, particularly convolutional neural networks (CNNs) such as EfficientNetB7, have demonstrated superior performance in a wide range of medical image analysis tasks by automatically learning relevant features directly from data22. However, this increase in accuracy and robustness comes with a trade-off: deep learning models are typically more computationally intensive and are sometimes regarded as "black boxes" due to their limited interpretability compared to traditional ML methods. The present study prioritized classification accuracy and clinical applicability, while acknowledging that further research into deep interpretable learning techniques may help bridge the gap between model performance and transparency in clinical settings25,26.
The observed impressive performance for lung adenocarcinoma and colon adenocarcinoma can be attributed to their distinctive histopathological features, which are more easily captured by the EfficientNetB7-based model. In contrast, the slightly lower recall values observed for lung squamous cell carcinoma (0.99) and colon benign tissue (0.85) are consistent with the well-recognized challenge of differentiating these classes, due to their subtle morphological similarities with other tissue types. Such modest differences in recall highlight the inherent complexity of certain histopathological categories and underscore the need for ongoing research to further improve class discrimination. Despite these nuances, the model's consistently high F1-scores and low error metrics across all categories confirm its balanced, reliable performance and reinforce its potential utility in real-world clinical workflows.
In the context of existing deep learning methods in the classification of histopathological images, notable advancements are seen in solving imbalanced classes and slight variations in histological structures27. Conventional methods tend to fail in high inter-class similarity and intra-class variation, which are successfully resolved by the suggested model through sophisticated preprocessing and augmentation methods28. Implementation of robust architecture like EfficientNetB7 that is trained data enhances the model having a distinct upper hand over models in the globe.
The trial and discovery reaffirm the ability of the model to be resistant and accurate, confirming the relevance for the aspect of medical diagnosis in real cases29,30. The standard performance sets the foundation for extensive development and for clinical application, making it a viable tool in the hands of pathologists with improved results in patient diagnosis31. Table 313,18,32,33,34,35,36,37,38shows the accuracy comparisons, which indicate that the suggested model outperforms most traditional CNN models and is thus more appropriate for real-world medical diagnostic applications.
Compared to existing deep learning approaches for histopathological image classification, the EfficientNetB7-based framework demonstrates clear advantages in both predictive performance and practical deployment. While previous methods have often struggled to balance high accuracy with class-wise consistency or required extensive computational resources, the proposed framework achieves higher overall accuracy, more balanced class-specific metrics, and faster inference. Advanced data augmentation and transfer learning not only mitigate issues of class imbalance and overfitting but also enable the model to perform robustly across challenging tissue subtypes. Additionally, the relatively compact model size and efficient computational requirements make this solution well-suited for integration into real-world digital pathology workflows, setting it apart from more resource-intensive architectures.
| Study | Technique | Accuracy |
| Wahid, R. R. et al.18 (2023) | Convolutional Neural Network (CNN) | 93.02% (Lung) |
| 88.26% (Colon) | ||
| Singh, O. et al.32 (2024) | EfficientNetB6 | 0.9312 |
| Maheshwari, U. et al.33 (2022) | CNN | 0.93 |
| Masud, M. et al.34 (2021) | Deep Learning (DL) | 96.33% |
| Hossain et al.35 (2022) | CNN | 0.94 |
| Swarna, I. J. et al.36 (2023) | CNN | 78.18 |
| Reis, H. C. et al.37 (2023) | DenseNet169 | 0.9429 |
| Laxmikant, K. et al.38 (2024) | Deep ConvNets (CNNs) | 0.9254 |
| Proposed model | Advanced EfficientNetB7 with series of Image Augmentation Techniques. | 0.96 |
Table 3: The quantitative performance comparison between the proposed model and existing approaches, summarizing accuracy and other key evaluation metrics.
This section compares the performance of the suggested model, which is based on the EfficientNetB7 architecture, with existing deep learning models in histopathological image classification. It highlights the better class imbalance management and resilience of the model via advanced preprocessing and augmentation. Figure 9 displays misclassified instances in the validation dataset, tagged with predicted and actual labels to examine errors.

Figure 9: Misclassified as class 0, illustrating common errors made by the model in this category. Please click here to view a larger version of this figure.
DATA AVAILABILITY:
The data used for the findings are available publicly at https://www.kaggle.com/datasets/andrewmvd/lung-and-colon-cancer-histopathological images. The complete code for this study is available as Supplementary File 2.
Supplementary File 1: The algorithm for enhanced detection of lung and colon cancer. Please click here to download this File.
Supplementary File 2: The complete code used for this study. Please click here to download this File.
In the critical review of mislabeled instances under the EfficientNetB7 deep learning architecture, a critical examination is carried out on instances where model predictions do not match real labels within the validation dataset. Critical analysis is of extreme importance in analyzing certain errors of classification, particularly when the model misclassifies various histopathological features of lung and colon tissues11. The procedure is to make class predictions on all the images in the validation set and compare them with true classifications. The images misclassified are indicated through their indices, such that predicted and actual labels do not coincide29. One of the images that has been misclassified, typically five, is depicted visually, labeled both with predicted and actual labels. This graphic display gives us a clear picture of what kinds of errors the model is committing, and it can assist us with ideas about why the misclassifications are happening. This sharp focus is necessary to apply when determining where the model might need some adjustments and possibly improvements in preprocessing techniques, data augmentation processes, or model architecture changes39. Amplifying these variables has the potential to significantly enhance the accuracy and reliability of the model as a diagnostic, which is the core of its use in clinical environments where accurate medical diagnostics are priceless.
EfficientNetB7 architecture was discovered to perform wonderfully well in this research, especially in major diagnostic classes like lung adenocarcinoma and colon adenocarcinoma with a maximum 1.00 precision values. This reflects its capacity to significantly help pathologists in diagnosing cancer tissue correctly7. Intra-class variability issues were solved with advanced data augmentation and early stopping methods, boosting the stability and generalizability of the model. The model, however, did not perform as well for classes like lung squamous cell carcinoma, an implication of the persisting problem of morphological similarity between states of tissue and limitations in observing fine-grained distinctions. The "black box" nature of EfficientNetB7 also contributes to the interpretability issue, which poses a blockage to clinical acceptance and trust30. Although the model theoretically can shorten diagnosis and offload pathologists' workload, further validation in clinical use and integration of explainable AI methods must be conducted to bridge AI technology and clinical acceptance40. Effort in the future needs to push datasets to include higher pathological heterogeneity and increase interpretability so the reliability and conformity of the model's medical standards are guaranteed.
Several protocol steps emerged as especially important for guaranteeing the efficacy of the proposed approach. The preprocessing phase, particularly resizing and normalization, is crucial to stabilizing training and allowing the model to extract histological features at a uniform scale41. Data augmentation is similarly crucial, as it introduces variability into the training pipeline and greatly minimizes the risk of overfitting and, hence, enhances generalization to novel samples. The selection of EfficientNetB7 as the base is another essential consideration, as its compound scaling approach optimizes both accuracy and compute efficiency for high-resolution histopathological images. Lastly, optimization hyperparameters like the learning rate schedule, batch size, and early stopping threshold have direct implications on convergence stability and training efficiency. Accurate control of these steps guarantees that the protocol can be reliably reproduced and extended successfully to related medical imaging tasks. EfficientNetB7-based architecture has significant computational effectiveness, of utmost importance for real-world deployment in clinical and automated pipelines. With a small model size of around 80 MB and quick inference time under 0.15 s per image on an NVIDIA Tesla P100 GPU and roughly 1.2 s on a standard CPU the architecture enables fast diagnostic turnaround even in high-throughput pathology environments. This efficiency not only saves hardware and energy expenses but also enables the model to be incorporated into current digital pathology platforms for real-time or batch processing42. The automated preprocessing and augmentation pipeline also simplifies adjustment to new datasets and clinical settings with minimal human intervention, making the solution both scalable and feasible for routine utilization in various healthcare settings.
A key limitation of this study is the exclusive reliance on the publicly available LC25000 dataset, which is restricted to histopathological images from lung and colon cancers. While extensive augmentation increased the dataset size to 25,000 images and improved model robustness, the evaluation was necessarily confined to these two cancer types. As a result, the generalizability of the proposed framework to other cancer subtypes, tissue sources, and real-world clinical cohorts remains untested. Future work should validate the model using larger, multi-institutional datasets encompassing a broader spectrum of cancer types and patient populations to fully establish its clinical applicability43. The framework was not evaluated on an independent external dataset in this study, which limits the evidence for its robustness and clinical applicability beyond the LC25000 data. Future work should include external validation to further assess real-world performance.
It was feasible in this study to develop and test a deep learning model based on the EfficientNetB7 architecture, fine-tuned for histopathological image classification of lung and colon tissue. The large set of 25,000 augmented images gave strong model training in a manner that its ability to answer real diagnostic problems with reasonable accuracy was assisted by an average model accuracy of 96% and extremely minor errors in MSE, RMSE, and MAE. While the model performed well as it was, future projects might look for some improvement to make it more applicable and efficient. Adding more rich and diverse datasets with samples of diverse demographics and mixed pathological stages could make the model more generalizable to larger populations. Looking at alternative architectures of deep learning or combinations of models may provide some insights into reducing the computational cost in terms of reduced accuracy. In practical deployment, the proposed framework can be integrated into digital pathology workflows using a user-friendly interface that allows pathologists to upload images, visualize classification results, and review key metrics44. Future work will involve developing and evaluating such an interface in collaboration with clinical experts. Future work will explore the integration of Grad-CAM or similar explainability methods to enhance model interpretability for clinical users. Implementation of this model in a clinical environment is also a possibility and would involve building a user-friendly interface and running extensive validation trials to the standard required in the clinical environment. The second promising direction would be incorporating explainability factors into the model so that users can understand the reasoning behind a specific diagnostic prediction, which is crucial for building trust and actionable knowledge in medical practice.
The authors declare that there is no conflict of interest regarding the publication of this manuscript. No financial or personal affiliations have influenced the research, results, or conclusions presented in this work.
This research is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R195), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through Large group research under grant number RGP2/749/46.
| A100 GPU (CUDA) | NVIDIA | CUDA Version 11.0 | GPU acceleration for model training and evaluation. |
| Kaggle Platform | N/A | Cloud based Notebook for Machine Learning Model Development | |
| Keras | TensorFlow (Google) | Version 2.6.0 | Deep learning API running on top of TensorFlow. |
| LC25000 | Borkowski AA, Bui MM, Thomas LB, Wilson CP, DeLand LA, Mastorides SM. Lung and Colon Cancer Histopathological Image Dataset (LC25000) | N/A | This dataset contains 25,000 histopathological images with 5 classes. All images are 768 x 768 pixels in size and are in jpeg file format. |
| Matplotlib | Python Software Foundation | Version 3.5.0 | Visualization library for plotting results. |
| NumPy | Python Software Foundation | Version 1.19.5 | Numerical computing library. |
| OpenCV | Open Source | Version 4.5.4 | Image processing and computer vision library. |
| Pandas | Python Software Foundation | Version 1.3.4 | Data analysis and manipulation tool. |
| Python (Anaconda Distribution) | Anaconda Inc | Version 3.7.12 | Includes pre-installed packages and environment management tools. |
| Scikit-learn | Python Software Foundation | Version 0.23.2 | Machine learning tools for performance evaluation. |
| TensorFlow | Version 2.6.2 | Deep learning framework for diffusion models. |