Method Article

Identification of Hub Genes, Single-Nucleotide Polymorphisms, and Potential Drug Targets In Breast Cancer Using Transcriptomic Analysis

DOI:

10.3791/70964

June 9th, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study investigated mitochondrial oxidative stress–related genes associated with therapeutic resistance in HER2-positive breast cancer. Using transcriptomic datasets, integrated bioinformatics, and clinical data (n = 4,929), the authors identified MTHFD2 and PRDX3 as key differentially expressed genes with potential prognostic value. In silico. analyses suggest functional effects of genetic variants.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

HER2-positive (HER2+) breast cancer often develops resistance to therapies like Lapatinib, potentially involving mitochondrial metabolic and redox reprogramming. This study aimed to identify mitochondrial oxidative stress–related genes (MOS-DEGs) as biomarkers of resistance and to characterize their functional nsSNPs and structural impacts. RNA-seq datasets (GSE231524, GSE231525) were analyzed using DESeq2 to identify DEGs, which were intersected with mitochondrial oxidative stress genes to obtain MOS-DEGs. Functional enrichment, PPI analysis, ROC analysis, expression profiling, and survival analysis were performed. nsSNPs were evaluated using multiple predictive tools, and structural impacts were assessed through secondary and 3D modeling. Integrated transcriptomic and clinical analysis identified MTHFD2 and PRDX3 as central MOS-DEGs displaying opposing regulatory roles in mitochondrial redox metabolism. MTHFD2 was significantly upregulated and associated with poor prognosis (HR = 1.53, p = 1.1 × 10⁻16), while PRDX3 exhibited protective expression patterns (HR = 0.73, p = 7.7×10⁻10). ROC analysis suggested their potential as predictors of therapeutic response. nsSNP analysis revealed five deleterious variants in MTHFD2 and four deleterious variants in PRDX3, with rs1471336772 (MTHFD2) and rs747786383 (PRDX3.) identified as the most pathogenic. These variants were predicted to be damaging based on multiple computational scoring systems, including SIFT ≤ 0.05, PolyPhen-2 ≥ 0.85, deleterious CADD scores, and high REVEL scores, and were located within catalytic or redox-active domains. Structural modeling suggested that these substitutions may destabilize conformation, disrupt metal- and cofactor-binding sites, and affect NADPH regeneration or thioredoxin-dependent peroxidase activity. Molecular dynamics predictions indicated potential loss of structural stability and altered flexibility, suggesting possible functional impairment of mitochondrial redox control. This study identifies MTHFD2 and PRDX3 as regulators of mitochondrial oxidative stress in HER2+ breast cancer. Deleterious nsSNPs in these genes may contribute to altered redox balance, potentially influencing metabolic adaptation (MTHFD2) and antioxidant defense (PRDX3).

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Breast cancer is the most commonly diagnosed cancer in women worldwide and a major cause of cancer-related death despite substantial progress in early detection and targeted therapeutics1,2,3. Molecular heterogeneity within breast tumors underlies diverse clinical outcomes and treatment responses, posing persistent challenges for precision oncology4,5. Among the recognized molecular subtypes, human epidermal growth factor receptor 2 (HER2)-positive breast cancer accounts for approximately 15–20% of all breast cancers and is characterized by HER2 gene amplification and overexpression of the HER2 receptor tyrosine kinase6,7. Although HER2-targeted therapies, such as trastuzumab and lapatinib, have markedly improved patient prognosis, primary and acquired resistance to these agents frequently emerges, leading to tumor relapse and disease progression8,9. Understanding the molecular mechanisms driving HER2 therapy resistance, therefore, remains a major unmet need in clinical oncology.

Emerging evidence suggests that mitochondrial dysfunction and oxidative stress play critical roles in mediating drug resistance in breast cancer. Mitochondria, beyond their canonical role in ATP production, are key regulators of apoptosis, redox signaling, and metabolic plasticity—all processes that cancer cells exploit to survive therapeutic pressure. In particular, resistant HER2+ cancer cells demonstrate a metabolic shift toward oxidative phosphorylation (OXPHOS), enhanced reactive oxygen species (ROS) buffering, and altered mitochondrial biogenesis. These adaptations enable cells to maintain survival signaling and evade apoptosis despite sustained inhibition of HER2 signaling pathways. Genes encoding mitochondrial complex subunits and redox-regulating enzymes are therefore potential biomarkers and therapeutic targets for overcoming resistance.

In parallel, non-synonymous single-nucleotide polymorphisms (nsSNPs) in mitochondrial or oxidative stress-related genes can modify protein structure and function, influencing enzyme activity, drug response, and disease susceptibility10. Computational prediction of the deleterious effects of nsSNPs using in silico.tools such as SIFT, PolyPhen-2, and CADD provides a rapid means of identifying functional variants that may contribute to cancer heterogeneity and therapeutic resistance. Integrating transcriptomic and mutational data thus offers a multidimensional view of gene dysregulation and structural alteration at both transcriptional and genomic levels.

Recent advances in bioinformatics and systems biology have enabled the high-throughput identification of potential driver genes through large-scale expression profiling and network-based analysis. Public repositories such as the Gene Expression Omnibus (GEO) provide valuable RNA-sequencing datasets that capture molecular changes across experimental models and patient cohorts. Coupled with analytical tools such as DESeq2, clusterProfiler, and STRING–Cytoscape network analysis, these resources enable comprehensive characterization of differentially expressed genes (DEGs), pathway enrichment, and hub gene discovery. Importantly, integrating mitochondrial oxidative stress gene signatures with DEGs can illuminate novel molecular mechanisms underlying HER2-targeted therapy resistance and identify candidate biomarkers for therapeutic intervention.

The HER3-driven resistance axis has recently gained attention as a major compensatory pathway following HER2 inhibition. HER3 activation restores downstream PI3K/AKT signaling, promoting cellular survival and drug tolerance (Figure 1). The DUSP6 (Dual Specificity Phosphatase 6), a negative regulator of ERK signaling, has been implicated in modulating this adaptive response. Inhibition of DUSP6 reactivates ERK signaling and may counteract HER3-mediated resistance, but its interplay with mitochondrial oxidative stress and metabolic adaptation remains insufficiently understood. Therefore, comparative transcriptomic analyses of HER2+ cell lines (BT474 and MDA-MB-453) under DUSP6 inhibition and chronic Lapatinib exposure offer an ideal model to dissect the molecular determinants of drug resistance.

HER2+ cancer diagram, oxidative stress, ROS impact, OXPHOS dysfunction, drug resistance pathway.
Figure 1. Schematic representation of the molecular landscape linking mitochondrial dysfunction, oxidative stress, and breast cancer progression. These integrated molecular mechanisms underlying mitochondrial oxidative stress–mediated oncogenesis in breast cancer. Dysregulated mitochondrial electron transport chain complexes particularly involving DUFS3, UQCRC1, COX4I1, SDHA, and ATP5PO. lead to excessive generation of reactive oxygen species (ROS), resulting in oxidative DNA damage, lipid peroxidation, and impaired mitochondrial membrane potential. These oxidative perturbations activate oncogenic signaling pathways such as PI3K/AKT, MAPK, and NF-κB, while suppressing apoptotic regulators including p53 and BAX, thereby promoting tumor cell survival, proliferation, and immune evasion. The diagram also highlights the dual impact of mitochondrial oxidative stress on cancer metabolism and the tumor microenvironment, emphasizing its contribution to therapy resistance and immune modulation under HER2+ breast cancer conditions. This integrative model forms the mechanistic basis for subsequent transcriptomic and SNP-based analyses aimed at identifying mitochondrial driver genes and potential therapeutic targets. Please click here to view a larger version of this figure.

The present study was designed to systematically identify hub genes, functional nsSNPs, and potential drug targets associated with mitochondrial oxidative stress in HER2+ breast cancer. By integrating transcriptomic data from GSE231524 and GSE231525 with curated mitochondrial and oxidative stress-related gene sets, the current study aimed to define mitochondrial oxidative stress-related differentially expressed genes (MOS-DEGs) that contribute to therapy resistance. Downstream analyses included gene ontology (GO) and KEGG pathway enrichment, protein–protein interaction (PPI) network construction, hub gene prioritization, and survival correlation analysis using the Kaplan–Meier Immunotherapy Database. Moreover, key genes were further investigated for nsSNP variation and potential structural impact to delineate mutation-driven functional consequences.

By combining transcriptomic profiling, network biology, and in silico. mutational analysis, this study provides an integrated framework for discovering potential mitochondrial biomarkers and therapeutic targets in HER2+ breast cancer. The identification of oxidative phosphorylation-linked hub genes and their functional SNPs may pave the way for the development of precision therapeutic strategies to overcome HER2-targeted drug resistance and improve patient outcomes.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Data retrieval and preprocessing

This study was conducted using publicly available transcriptomic and genetic data; no human or animal subjects were directly involved. Transcriptomic datasets relevant to HER2+ breast cancer therapy resistance were retrieved from the NCBI Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/)11. Two RNA-seq datasets, GSE231524 and GSE231525, were selected because of their specific focus on HER3-driven resistance and DUSP6 inhibition in HER2+ breast cancer cell lines (BT474 and MDA-MB-453). These datasets included parental, drug-tolerant, and drug-resistant phenotypes derived from Lapatinib (1 µM) exposure and DUSP6 knockdown. Raw count matrices and corresponding metadata files were accessed using the GEOquery (v2.70.0) and Biobase (v2.62.0) packages in RStudio (v4.3.2)12. The metadata were curated to define two main contrasts for each dataset: GSE231524 compared the control (BT474 parental, Day 0) with drug-tolerant and drug-resistant samples (Day 9–Month 9), while GSE231525 compared the control (Scrambled siRNA) with DUSP6 knockdown (DUSP6-KD). Quality control and data normalization were performed using the DESeq2 (v1.42.0) framework, which applies a variance-stabilizing transformation (VST) to reduce heteroscedasticity and ensure comparability across samples. Data distribution and clustering patterns were visually assessed using ggplot2 (v3.5.0) and pheatmap (v1.0.12) to confirm data uniformity and identify potential outliers prior to differential expression analysis13,14.

In this study, a clear distinction was maintained between findings derived from cell line transcriptomic datasets and those obtained from patient-derived clinical datasets. Cell line data were primarily used for exploratory analysis, including the identification of differentially expressed genes and the generation of preliminary mechanistic insights in controlled experimental models. In contrast, patient-derived datasets were employed for external validation of gene expression patterns and assessment of clinical relevance, including prognostic evaluation. Accordingly, results from cell line models and clinical cohorts are interpreted separately to avoid overgeneralization and ensure appropriate translational context for all findings.

Differential gene expression analysis

Differential expression analysis was performed to identify genes that were significantly modulated between the control and treatment conditions. Normalized counts were processed using the adjusted regression model (ARM) integrated into DESeq2 to accurately estimate log₂ fold changes and statistical significance. The design formula was defined as ~condition, representing the control versus treated groups. Genes with an adjusted p-value (FDR) < 0.05 and absolute log₂ fold change ≥ 1 were considered significantly differentially expressed. Log₂ fold change shrinkage was performed using the apeglm method to enhance robustness in effect size estimation. The results of the analysis were visualized using EnhancedVolcano (v1.22.0)15 and ggplot216, which generated volcano plots and MA plots displaying the relationship between expression magnitude and statistical confidence. Dispersion estimates were also assessed within DESeq2 to ensure accurate variance modeling and consistent normalization across biological replicates17.

Retrieval and identification of mitochondrial oxidative stress-related differentially expressed genes (MOS-DEGs)

To investigate the link between energy metabolism, oxidative stress, and drug resistance, a comprehensive list of mitochondrial and oxidative stress-associated genes was compiled from multiple databases, including Human MitoCarta3.018 (https://personal.broadinstitute.org/scalvo/MitoCarta3.0/human.mitocarta3.0.html), Gene Ontology (GO:0006979, response to oxidative stress) (http://geneontology.org/), the Kyoto Encyclopedia of Genes and Genomes (KEGG) oxidative phosphorylation pathway (https://www.genome.jp/kegg/), and the Human Oxidative Stress Gene Database (HOSGDB) (http://hosgdb.com/). All retrieved genes were standardized to HGNC-approved gene symbols using org.Hs.eg.db (v3.18.0) and AnnotationDbi (v1.64.0), while duplicate entries, pseudogenes, and non-coding RNAs were removed to ensure annotation accuracy. The resulting curated mitochondrial oxidative stress gene panel (MOS genes) was subsequently used as a reference set for integration with the differentially expressed genes identified from both transcriptomic datasets.

The intersection of the curated MOS gene list with the DEGs obtained from GSE231524 and GSE231525 was performed in R using dplyr (v1.1.3)19 and base R intersect() functions. This integrative approach enabled the identification of MOS-DEGs, representing genes functionally linked to mitochondrial metabolism, redox regulation, and oxidative stress adaptation. The overlap among datasets was visualized using the VennDiagram (v1.7.3) package in R to illustrate shared and unique genes across the experimental models20. The refined list of MOS-DEGs was used for downstream analyses, providing mechanistic insight into the transcriptional and metabolic reprogramming underlying HER2-targeted therapy resistance.

Expression profiling and visualization of MOS-DEGs

Expression profiling of the identified MOS-DEGs was conducted using ComplexHeatmap (v2.18.0)21 and pheatmap (v1.0.12) packages in RStudio to visualize global expression patterns across parental, drug-tolerant, and resistant conditions. Normalized count data were transformed using z-score scaling to standardize the gene expression matrix across samples. Clustering was performed using Euclidean distance and complete linkage methods to detect co-expression patterns and to distinguish condition-specific transcriptional profiles. Heatmaps and clustering plots were generated using ggplot2 to ensure clear visual differentiation among conditions. This visualization approach facilitated the identification of gene groups associated with mitochondrial activity, oxidative stress modulation, and metabolic reprogramming under drug-resistant states.

Functional enrichment and pathway annotation

To explore the biological significance and regulatory mechanisms of the identified MOS-DEGs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted using R Studio (version 4.3.1). The analyses were performed in the tidyverse environment utilizing multiple Bioconductor packages for reproducible computation and visualization. Gene annotation and identifier mapping were performed using the org.Hs.eg.db database (https://bioconductor.org/packages/org.Hs.eg.db/) based on the Homo sapiens reference genome (GRCh38). GO enrichment analysis was carried out using the clusterProfiler package (version 4.8.1; https://bioconductor.org/packages/clusterProfiler/), which categorizes genes into three major ontologies—Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). The enrichGO22 function was used with parameters set to p-value < 0.05 and adjusted p-value. (FDR) < 0.05, applying the Benjamini–Hochberg correction method. Visualizations, including bar plots, dot plots, and chord diagrams, were generated using enrichplot (https://bioconductor.org/packages/enrichplot/), ggplot213 (https://cran.r-project.org/web/packages/ggplot2/), and GOplot (https://cran.r-project.org/web/packages/GOplot/). These tools provided a structured view of the enriched GO terms and their gene associations.

KEGG pathway enrichment was performed using the enrichKEGG() function within the clusterProfiler package, referencing the human KEGG database (https://www.genome.jp/kegg/). The KEGGREST package (https://bioconductor.org/packages/KEGGREST/) was used for pathway data retrieval and annotation. Pathways with an adjusted p-value. (q-value) < 0.05 were considered significant. Visualization and pathway mapping were conducted using pathview (https://bioconductor.org/packages/pathview/), ggplot2, and enrichplot, while igraph and ggraph were used for network representation23. All enrichment analyses and visualizations were implemented in R Studio (v4.3.1) using reproducible code and standardized Bioconductor workflows, ensuring reliable identification of enriched functional categories and biological pathways associated with the MOS-DEGs.

ROC-based validation of predictive biomarkers in breast cancer

To validate the clinical predictive power of the MOS-DEGs, receiver operating characteristic (ROC) curve analysis was performed using the ROCplotter online tool (https://www.rocplot.org/)24. ROCplotter is an integrated web-based platform that combines gene expression data with clinically annotated treatment response datasets from 3,104 breast cancer patients, including those treated with chemotherapy, hormonal therapy, or anti-HER2 agents.

The analysis was conducted using the parameters “pathological complete response” as the outcome variable and “any chemotherapy” as the treatment category. Gene expression values derived from Affymetrix microarray datasets were automatically stratified into responder and non-responder groups based on clinical annotations within the platform.

The receiver operating characteristic (ROC) curve (AUC), Mann–Whitney U test, fold change, and chi-square test were applied to assess each gene’s ability to distinguish responders from non-responders. The area under the curve (AUC) was used as the primary metric to evaluate discriminative performance. AUC values greater than 0.55 with ROC p-values < 0.05 were considered significant, representing moderate predictive performance typical of transcriptomic biomarkers, while false discovery rate (FDR) correction was applied to maintain analytical rigor.

All selected MOS-DEGs were queried using their corresponding Affymetrix probe IDs. The discriminative potential of each gene was evaluated across clinical breast cancer cohorts, where expression data were stratified into responder and non-responder groups. ROC curves, boxplots, and associated statistical outputs were generated directly by the ROCplotter platform and exported for downstream visualization and comparison. The analysis quantified the predictive value of redox and metabolic regulators involved in mitochondrial oxidative stress. Genes that exhibited consistent predictive significance across clinical samples were retained for inclusion in the final predictive panel.

Differential expression analysis in tumor, normal, and metastatic tissues (TNMplot analysis)

The expression patterns of the top MOS-DEGs were analyzed across normal, tumor, and metastatic breast tissues using the TNMplot web tool v2  (https://tnmplot.com/analysis/)25. Both RNA-Seq (TCGA + GTEx + MET500) and gene chip datasets were examined to ensure cross-platform validation. The “Multiple Gene Analysis” module was used with Breast Invasive Carcinoma as the selected tissue type26. Expression values were log₂-transformed and compared across Tumor vs. Normal (TvsN), Metastatic vs. Tumor (MvsT), and Metastatic vs. Normal (MvsN) groups. TNMplot automatically computed fold-change (FC) and p-values using the Mann–Whitney U test to assess statistical significance. Expression distributions were visualized as boxplots and density plots directly generated from the TNMplot interface, with green, red, and gray representing normal, tumor, and metastatic tissues, respectively. All figures were exported in high resolution for integration into the results section. This dual-platform analysis enabled robust identification and validation of key mitochondrial redox–metabolic regulators associated with breast cancer progression27.

Survival and prognostic analysis using Kaplan–Meier plotter

To evaluate the prognostic relevance of MOS-DEGs in breast cancer, survival analysis was conducted using the Kaplan–Meier Plotter online tool (https://kmplot.com/analysis/)28. This database integrates gene expression and survival data from more than 4,900 breast cancer patients, derived from multiple GEO, EGA, and TCGA datasets. The analysis was performed for recurrence-free survival (RFS) using individual Affymetrix probe IDs corresponding to the prioritized genes: 225609_at (GSR), 201761_at (MTHFD2), 201619_at (PRDX3/AOP1), and 201128_s_at (ACLY). Patients were divided into high- and low-expression groups based on the median expression cutoff, and survival probabilities were estimated using the Kaplan–Meier method. The log-rank test was used to assess statistical significance between survival curves, and hazard ratios (HRs) with 95% confidence intervals (CIs) were automatically calculated by the tool. All analyses were performed using the RFS endpoint, with no restriction based on hormone receptor or HER2 status (ER, PR, HER2 = all). Redundant samples were removed, and proportional hazards assumptions were verified to ensure statistical robustness. Quality control filters excluded biased microarrays. No manual probe selection or p-value correction for multiple testing was applied, in accordance with the KM Plotter default settings. Statistical significance was defined as p. < 0.05. Survival plots were visualized and downloaded in high resolution for further interpretation, comparing outcomes between high and low expressers of each candidate MOS gene29,30.

The present analysis was initiated using datasets exclusively comprising HER2+ breast cancer samples for the identification of differentially expressed genes (DEGs) and hub genes. Subsequently, survival analysis was performed without restriction to HER2 status (ER, PR, HER2 = all) to assess the broader prognostic relevance and generalizability of the identified genes. This approach was employed as a secondary validation step rather than to redefine the study focus. Therefore, the prognostic implications of the identified hub genes are interpreted cautiously, with primary conclusions remaining specific to HER2+ breast cancer.

The canonical transcript sequences of MTHFD2-201 (ENST00000394053.7) and PRDX3-201 (ENST00000298510.4) were retrieved from the Ensembl Genome Browser (https://www.ensembl.org)31,32. Variant annotation and classification were performed using the Ensembl Variant Effect Predictor (VEP) (https://www.ensembl.org/vep), which provided detailed genomic context, codon alterations, and amino acid substitutions for each identified variant. Only missense variants (non-synonymous SNPs) were selected for downstream analysis.

Pathogenicity prediction and variant prioritization

The functional consequences of each nsSNP were evaluated using a combination of computational prediction tools. SIFT (https://sift.bii.a-star.edu.sg) was applied to assess amino acid conservation, classifying variants with a score ≤ 0.05 as deleterious33. PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2) estimated the structural and evolutionary impact of substitutions, where scores ≥ 0.85 indicated probable damage34. CADD (https://cadd.gs.washington.edu) provided a composite deleteriousness score integrating multiple annotations, with values ≥ 20 denoting high pathogenic potential35. Complementary metrics from MetaLR36, Mutation Assessor, and REVEL were integrated from the VEP interface for enhanced prediction reliability37. Variants meeting the thresholds MetaLR ≥ 0.70, Mutation Assessor ≥ 3.5, and REVEL ≥ 0.75 were prioritized as likely pathogenic.

Structural and mechanistic impact prediction

To evaluate how amino acid substitutions influence structural integrity and biochemical function, each top-ranked nsSNP was further analyzed using MutPred2 (http://mutpred.mutdb.org)38 and DynaMut (http://biosig.unimelb.edu.au/dynamut)39. MutPred2 estimated the probability of functional disruption, including altered catalytic activity, gain or loss of metal-binding residues, changes in solvent accessibility, and allosteric modulation, with scores ≥ 0.80 classified as highly pathogenic. DynaMut computed the Gibbs free energy change (ΔΔG) between wild-type and mutant proteins, assessing the direction and magnitude of the stability alteration, and generated visualizations of atomic displacements and hydrogen bond rearrangements.

Secondary and 3D structural modeling and solvent accessibility profiling

The experimentally resolved crystal structures of MTHFD2 and PRDX3 were obtained from the Protein Data Bank (PDB) and processed using PyMOL V:3.1 (https://pymol.org)40 to visualize the spatial distribution of deleterious residues. Mutant models were created by introducing the corresponding amino acid substitutions, followed by structural refinement and energy minimization. Comparative 3D inspection highlighted shifts in secondary elements, altered interatomic contacts, and the spatial proximity of nsSNPs to catalytic and cofactor-binding domains, revealing potential disruptions in redox and metabolic functions.

Secondary structure and solvent exposure analyses were performed using PSIPRED V: 3.2 (http://bioinf.cs.ucl.ac.uk/psipred)41,42 and NetSurfP 3.0 (https://services.healthtech.dtu.dk/service.php?NetSurfP-2.0)43. These tools predicted α-helices, β-strands, coils, and disordered regions along with relative solvent accessibility (RSA) scores. Residues exhibiting moderate-to-high RSA values and structural order were mapped to identify solvent-exposed and functionally critical positions. The affected sites were visualized in 2D topology diagrams to determine whether deleterious mutations occurred in rigid catalytic cores or flexible loop regions, thereby predicting their probable effects on protein folding dynamics and enzymatic efficiency.

Database cross-validation, functional integration, and stability validation

Each prioritized nsSNP was cross-referenced against population-level genomic databases including dbSNP, 1000 Genomes, ExAC, and gnomAD to confirm variant frequency, global allelic distribution, and previously reported clinical associations. The integration of evolutionary conservation, structural modeling, and machine-learning–based functional prediction enabled the identification of high-confidence deleterious variants in MTHFD2 and PRDX3. These high-impact mutations were subsequently mapped to functional domains to elucidate their potential role in mitochondrial oxidative stress imbalance, altered metabolic signaling, and therapeutic resistance in breast cancer. To further confirm the thermodynamic consequences of each deleterious substitution, iMutant 3.0 (https://folding.biofold.org/i-mutant/i-mutant3.0.html)44 was used to predict the effects of mutations on protein stability using sequence and structural data. The analysis computed ΔΔG values (kcal/mol) representing the change in free energy between wild-type and mutant proteins. Variants showing negative ΔΔG values were classified as destabilizing mutations, indicating reduced protein stability and increased unfolding probability. Integration of iMutant predictions with DynaMut and MutPred2 results provided cross-validation for identifying structurally critical residues likely to affect redox function, catalytic integrity, and overall protein conformational stability.

Integrated functional interpretation and therapeutic relevance

All identified deleterious nsSNPs were validated through cross-referencing with dbSNP, gnomAD, and ExAC population databases to verify minor allele frequencies and previously reported associations with cancer phenotypes. Integrative interpretation of evolutionary conservation, structural modeling, and stability data indicated that the high-impact mutations rs1471336772 (MTHFD2) and rs747786383 (PRDX3) exert the strongest deleterious effects on protein conformation and catalytic efficiency. The computational results collectively suggest that mutations in MTHFD2 destabilize NADPH-dependent redox metabolism, while mutations in PRDX3 impair peroxidase-mediated oxidative stress defense, contributing to mitochondrial dysfunction and tumor aggressiveness. This nsSNP-based structural and functional analysis provides a computational foundation for future therapeutic screening and mutational validation, highlighting MTHFD2 and PRDX3 as precision biomarkers for redox-targeted breast cancer therapy. To improve clarity and provide a comprehensive overview of the analytical strategy, a schematic workflow summarizing the study's major steps is presented in Figure 2. The workflow integrates differential gene expression analysis, mitochondrial gene filtering, protein–protein interaction network construction, clinical validation using ROC analysis, and nsSNP-based structural characterization. This stepwise framework highlights the logical progression from transcriptomic data processing to biomarker identification and functional interpretation.

Differential expression analysis, RNA-Seq, PPI network, ROC validation, SNP structural analysis diagram.
Figure 2. Integrative multi-step workflow for identification and validation of mitochondrial oxidative stress–related biomarkers in HER2+ breast cancer. This schematic summarizes the analytical pipeline employed in the study. First, differential gene expression (DEG) analysis was performed on RNA-seq datasets (GSE231524 and GSE231525) to identify significantly altered genes. These DEGs were intersected with curated mitochondrial oxidative stress–related genes to obtain MOS-DEGs. Next, protein–protein interaction (PPI) network analysis was conducted using STRING and Cytoscape to identify hub genes and functional modules. Subsequently, receiver operating characteristic (ROC) curve analysis was applied using the ROCplotter platform to evaluate the predictive performance of selected genes in clinical cohorts. Finally, non-synonymous SNP (nsSNP) analysis and structural modeling were performed to assess the potential functional and structural impact of key variants in prioritized genes (MTHFD2 and PRDX3). This integrative workflow links transcriptomic, network, clinical, and structural analyses to identify potential biomarkers and therapeutic targets. Please click here to view a larger version of this figure.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Differential gene expression analysis

Differential expression profiling was performed to investigate transcriptional alterations associated with HER2+ breast cancer progression and therapy resistance using two independent RNA-seq datasets, GSE231524 and GSE231525. In the first dataset (GSE231524), a total of 19,727 genes were initially quantified. Following normalization, filtering, and application of the adjusted regression model (ARM) for differential expression, 6,604 genes were retained as significantly expressed (Supplemental Table S1). After gene symbol standardization, duplicate removal, and annotation refinement, 6,300 unique genes were finalized for downstream analyses. The volcano plots, MA plots, and dispersion estimates (Figure 3A-C) clearly illustrate the distribution of significantly upregulated and downregulated genes (adjusted p < 0.05, |log₂FC| ≥ 1). The dispersion estimates show well-fitted variance patterns across mean normalized counts, indicating robust normalization and appropriate model fitting in the DESeq2 workflow45.

Similarly, for the second dataset (GSE231525), an initial set of 18,372 genes was obtained. After applying the same ARM-based differential expression analysis pipeline, 9,508 genes were identified as significantly expressed (Supplemental Table S2). Upon renaming, duplicate removal, and quality control filtering, 8,510 unique genes were retained for further comparative analysis. The visualizations (Figure 3D-F) demonstrate consistent patterns of gene expression variation across samples, with clearly defined clusters of upregulated and downregulated genes under experimental conditions. The MA plots and dispersion estimates show that the data exhibit well-stabilized variance and no major outlier influence, validating the reliability of downstream integrative analyses46.

Together, these analyses highlight the reproducibility and integrity of gene expression patterns between the two datasets. Both GSE231524 and GSE231525 exhibited distinct transcriptional profiles consistent with HER2-associated signaling perturbations and therapy-induced metabolic stress. The processed datasets, now harmonized and quality-controlled, form the foundation for identifying mitochondrial oxidative stress-related genes and subsequent network-based enrichment analyses47.

Breast cancer gene expression analysis, logFC vs p-value volcano and MA plots, data visualization.
Figure 3. Differential gene expression analysis of HER2+ breast cancer datasets. Results for (A-C) GSE231524 (control vs. breast cancer), and (D-F) for GSE231525 (control vs. DUSP6 knockdown). (A,D) Volcano plots display significantly upregulated (red) and downregulated (blue) genes (padj. < 0.05, |log₂FC| ≥ 1). (B,E) MA plots illustrate log₂ fold change versus mean normalized counts, confirming stable variance across samples. (C,F) Dispersion plots show fitted variance trends, validating proper normalization in DESeq2. Together, the datasets demonstrate consistent transcriptional alterations and robust model fitting suitable for downstream analysis. Please click here to view a larger version of this figure.

Identification of mitochondrial oxidative stress-related differentially expressed genes (MOS-DEGs)

Following differential expression analysis, the two datasets (GSE231524 and GSE231525) were cross-compared to identify common transcriptional alterations associated with HER2+ breast cancer therapy resistance. To specifically focus on metabolic and oxidative stress-related pathways, both DEG lists were overlapped with a curated mitochondrial gene (MG) dataset comprising 1,137 genes obtained from MitoCarta3.0, KEGG Oxidative Phosphorylation, and GO:0006979 (response to oxidative stress) databases.

The Venn diagram (Figure 4A) illustrates the overlap among DEGs from both datasets and the mitochondrial gene set. From this intersection, a total of 262 overlapping genes designated as Mitochondrial Oxidative Stress-Related Differentially Expressed Genes (MOS-DEGs)-were identified. These overlapping genes represent a convergent mitochondrial response to HER2-targeted therapy stress, integrating oxidative stress adaptation, metabolic reprogramming, and redox homeostasis. Specifically, 2,706 unique DEGs were exclusive to GSE231524, 4,870 were unique to GSE231525, while 3,167 genes were shared between the two datasets. The overlapping set of 262 genes with mitochondrial origin underscores the biological convergence between redox signaling, energy metabolism, and mitochondrial function in resistant breast cancer phenotypes (Supplemental Table S3). To further characterize these MOS-DEGs, an expression heatmap was generated to visualize transcriptional variation across experimental conditions (Figure 4B). Using RStudio (v4.3.2), hierarchical clustering was performed via the ComplexHeatmap (v2.18.0) and pheatmap (v1.0.12) packages, while data normalization and scaling were achieved through DESeq2 (v1.42.0) and ggplot2 (v3.5.0). The heatmap displays the z-score-scaled expression profiles of the 262 MOS-DEGs across all samples, revealing distinct clustering patterns that differentiate control, drug-tolerant, and resistant phenotypes.

Notably, several genes associated with mitochondrial respiration and oxidative stress defense including PRDX3, SOD2, NDUFA1, COX6C, ATP5ME, and GPX1 exhibited substantial upregulation in resistant and DUSP6-knockdown samples, whereas metabolic regulators such as IDH2, ACLY, and SDHA. showed variable modulation, indicating dynamic shifts in mitochondrial energy metabolism. This transcriptional reprogramming supports the hypothesis that HER2+ resistant cells rely on mitochondrial oxidative stress adaptation and metabolic plasticity for survival under therapeutic pressure. Overall, the integration of mitochondrial gene sets with differential expression results provided a targeted subset of functionally relevant genes MOS-DEGs that are central to redox balance, oxidative phosphorylation, and metabolic resilience in resistant breast cancer cells. The expression heatmap confirms that these genes form a coherent transcriptional signature associated with therapy adaptation and mitochondrial functional remodeling.

Venn diagram gene expression datasets overlap; heatmap expression analysis, biological research.
Figure 4. Identification and expression profiling of MOS-DEGs. (A) Venn diagram showing the overlap between DEGs from GSE231524, GSE231525, and 1,137 mitochondrial genes. The intersection identifies 262 shared MOS-DEGs involved in oxidative stress and mitochondrial metabolism. (B) Heatmap displaying z-score-scaled expression profiles of the 262 MOS-DEGs across control, drug-tolerant, and resistant samples. The heatmap was generated in RStudio using the ComplexHeatmap, pheatmap, ggplot2, and DESeq2 packages, demonstrating clear clustering between experimental conditions and highlighting upregulation of key mitochondrial oxidative stress regulators in resistant phenotypes. Please click here to view a larger version of this figure.

Protein-protein interaction (PPI) network and pathway enrichment analysis

Subsequent protein-protein interaction (PPI) analysis using the STRING database revealed a densely interconnected mitochondrial oxidative stress-related DEG (MOS-DEG) network consisting of 210 nodes and 951 edges, with an average node degree of 9.06 and a local clustering coefficient of 0.457 (Figure 5A). The PPI enrichment p-value (< 1.0 × 10⁻16) indicated that the observed connectivity was significantly greater than expected by chance, reflecting a functionally coherent mitochondrial interaction landscape. Core hub proteins such as COX6C, NDUFA10, ATP5F1, PRDX3, and SOD2 emerged as central mediators of redox homeostasis and mitochondrial metabolic regulation, underscoring their importance in maintaining oxidative balance and energy stability during therapy-induced stress.

To refine the network and identify the most influential regulatory elements, the MOS-DEG interactome was imported into Cytoscape and analyzed using the CytoHubba MCC (Maximal Clique Centrality) algorithm (Figure 5B). The ranking revealed the top 50 hub MOS-DEGs, which were reanalyzed using STRING for functional enrichment and network clustering (Figure 5C,D). The resulting subnetwork demonstrated exceptionally high functional cohesion, comprising 369 observed interactions (expected = 39), an average node degree of 14.76, a local clustering coefficient of 0.698, and an enrichment p-value of 0.0. These metrics confirmed that the network's organization was not random but represented a tightly regulated molecular architecture integrating mitochondrial metabolism and oxidative stress signaling. The MCL clustering analysis revealed three highly connected functional modules: one corresponding to central energy metabolism, including key TCA cycle and oxidative phosphorylation enzymes (SDHA, IDH2, ACO2, DLAT, UQCRC1, COX10); a second representing the mitochondrial translation machinery (MRPL16, MRPL45, DAP3, LYRM7), which maintains mitochondrial protein synthesis and respiratory complex integrity; and a third involving amino acid catabolism and redox regulation (GLUD1, GLS2, PRODH, L2HGDH, ALDH18A1), responsible for replenishing TCA intermediates and sustaining the NADH/FADH₂ balance required for cellular redox control. Functional enrichment indicated a strong overrepresentation of mitochondrial localization (FDR = 3.39 × 10⁻50), carboxylic acid metabolism (FDR = 6.33 × 10⁻23), and cellular respiration (FDR = 9.18 × 10⁻14), indicating that these genes form an integrated bioenergetic-oxidative stress axis linking oxidative phosphorylation, antioxidant defense, and amino acid metabolism.

Based on network topology and functional enrichment, ten master regulatory MOS genes were prioritized for further characterization: PRDX3, MTHFD2, SDHA, NDUFV1, PDK1, ACLY, GLS2, GSR, DAP3, and LYRM7 (Figure 5E). STRING-based analysis of this curated subset revealed a highly cohesive interaction core comprising 13 direct associations, with an average node degree of 2.6, a clustering coefficient of 0.843, and a network enrichment p-value of 3.3 × 10⁻8. Nearly all proteins were localized to the mitochondrion (9 of 10; FDR = 2.27 × 10⁻7) or mitochondrial matrix (7 of 10; FDR = 2.27 × 10⁻7). The principal pathways represented by these proteins included dicarboxylic acid metabolism (SDHA, GLS2, MTHFD2, ACLY; FDR = 1.6 × 10⁻3), the pyruvate dehydrogenase complex (PDK1, ACLY; FDR = 3.9 × 10⁻3), and the TCA cycle (SDHA, ACLY; FDR = 1.68 × 10⁻2). In parallel, redox-related functions such as oxidoreductase activity (GSR, SDHA, PRDX3, MTHFD2, NDUFV1; FDR = 1 × 10⁻3) and flavoprotein motif enrichment (GSR, SDHA, NDUFV1; FDR = 3.9 × 10⁻3) were significantly enriched, confirming the network's role in ROS detoxification and mitochondrial redox regulation. The strongest interactions, including SDHA-NDUFV1 (score = 0.999), SDHA-ACLY (0.964), DAP3-LYRM7 (0.678), and GSR-PRDX3 (0.661), established two interlinked subnetworks: an energy-redox module (SDHA-NDUFV1-ACLY-GSR-PRDX3) that bridges oxidative phosphorylation with antioxidant capacity, and a translation-assembly module (DAP3-LYRM7) that integrates mitochondrial ribosome function with respiratory complex III biogenesis.

Together, these ten genes form a cohesive mitochondrial core that integrates energy metabolism, redox buffering, and biosynthetic adaptation. PRDX3 and GSR constitute the antioxidant arm that safeguards the mitochondrial environment, while MTHFD2 supports NADPH regeneration and one-carbon metabolism essential for proliferative survival. SDHA and NDUFV1 anchor the electron transport system, linking TCA flux to ATP production, whereas PDK1 and ACLY orchestrate metabolic rewiring that sustains anabolic growth under stress. DAP3 and LYRM7 represent structural and translational stabilizers that preserve mitochondrial integrity. This multi-tiered molecular framework defines a bioenergetic-redox regulatory core fundamental to HER2-targeted therapy resistance. The integrative analysis underscores how oxidative metabolism and redox adaptation are co-opted to preserve mitochondrial functionality and cellular viability in resistant breast cancer phenotypes, revealing potential targets for metabolic intervention and precision therapy (Table 1).

Protein interaction network analysis diagrams A-E; visualizing molecular connections and pathways.
Figure 5. Network analysis and prioritization of mitochondrial oxidative stress-related genes. (A) STRING-based PPI network of the top 210 MOS-DEGs illustrating overall interaction density. (B) Import of the PPI network into Cytoscape for visualization and topological analysis. (C) Extraction of the top 50 hub MOS-DEGs using the CytoHubba MCC algorithm. (D) STRING reanalysis of the top 50 hub genes reveals three major functional modules: central energy metabolism, mitochondrial translation, and amino acid catabolism/redox regulation. (E) STRING-based visualization of the top 10 prioritized MOS-DEGs depicting a highly connected oxidative-metabolic network (average node degree = 2.6; clustering coefficient = 0.843; PPI enrichment p = 3.3 × 10⁻8). Please click here to view a larger version of this figure.

Table 1: Final prioritized gene panel (Top 10 MOS master regulators). Please click here to download this Table.

Functional enrichment analysis of top 10 MOS-DEGs

To elucidate the biological relevance and molecular mechanisms associated with the top 10 prioritized MOS-DEGs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed. The GO enrichment was categorized into three major ontologies-Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) to uncover their biological roles in oxidative metabolism and mitochondrial regulation.

GO Biological Process (BP) analysis revealed significant enrichment in metabolic and redox-related pathways, including dicarboxylic acid metabolic process (GO:0043648; enrichment score = 6.87; p = 1.33 × 10⁻7), acetyl-CoA biosynthetic process (GO:0006085; enrichment = 4.40), generation of precursor metabolites and energy (GO:0006091; enrichment = 4.06), and electron transport chain (GO:0022900; enrichment = 4.03). Other enriched biological processes included cell redox homeostasis and cellular response to oxidative stress, underscoring the tight link between oxidative phosphorylation, redox equilibrium, and metabolic reprogramming in tumor progression. GO Cellular Component (CC) enrichment indicated a predominant localization of these MOS-DEGs within mitochondrial structures, particularly the mitochondrial matrix (GO:0005759; p = 5.79 × 10⁻10), mitochondrial inner membrane, oxidoreductase complex, and respiratory chain complex.

The presence of genes such as MTHFD2, NDUFV1, PRDX3, GSR, and SDHA reflects their direct role in electron transport and oxidative phosphorylation systems, supporting their importance in mitochondrial energy conversion and redox signaling. GO Molecular Function (MF) analysis highlighted key catalytic and redox functions, including oxidoreductase activity, acting on sulfur group of donors, NAD(P) as acceptor (GO:0016668; enrichment = 4.68; p = 2.07 × 10⁻5), electron transfer activity (GO:0009055; enrichment = 4.43), antioxidant activity (GO:0016209; enrichment = 3.02), and flavin adenine dinucleotide (FAD) binding (GO:0050660; enrichment = 3.03). The consistent enrichment of PRDX3, GSR, MTHFD2, and SDHA. across multiple redox-related functions further supports their involvement in maintaining oxidative balance in mitochondrial compartments. Collectively, these GO findings highlight the central involvement of MOS-DEGs in mitochondrial oxidative phosphorylation, NAD(P)H-mediated redox cycling, and antioxidant defense mechanisms-processes essential for sustaining tumor metabolism and adaptation to oxidative stress (Figure 6A-D).

Gene ontology analysis; diagrams A-C, networks; D, bar chart; enrichment scores, biological pathways.
Figure 6. Gene Ontology enrichment analysis of top 10 MOS-DEGs. (A) Biological Process terms showing significant enrichment in oxidative metabolism and energy generation. (B) Cellular Component enrichment indicating mitochondrial localization and respiratory chain association. (C) Molecular Function enrichment highlighting oxidoreductase, electron transfer, and antioxidant activity. (D) Bar plot summarizing top GO terms across BP, CC, and MF categories based on enrichment scores and -log₁₀(p-values). Please click here to view a larger version of this figure.

KEGG pathway enrichment analysis

The KEGG pathway analysis provided a complementary systems-level perspective, revealing that the top 10 MOS-DEGs were significantly enriched in several oxidative metabolism and disease-associated pathways (Figure 7A-E). The most significantly enriched pathways included the Citrate cycle (TCA cycle) (hsa00020; p = 0.0002; enrichment score = 3.69), Diabetic cardiomyopathy (hsa05415; p = 0.0003), Central carbon metabolism in cancer (hsa05230; p = 0.0011), and Oxidative phosphorylation (hsa00190; p = 0.0042). Additionally, enrichment in pathways such as Non-alcoholic fatty liver disease (NAFLD), Chemical carcinogenesis-reactive oxygen species, Parkinson's disease, and Prion disease reflects the broad association of these genes with oxidative damage and mitochondrial dysfunction, common hallmarks of cancer and neurodegeneration. Among the genes driving these enrichments, SDHA and NDUFV1 were consistently represented across oxidative phosphorylation, ROS detoxification, and mitochondrial complex I/II-linked pathways. MTHFD2 and GLS2 contributed to one-carbon metabolism and glutamine-derived anaplerotic flux in the TCA cycle, while GSR and PRDX3. participated in glutathione redox cycling. These findings demonstrate that metabolic-oxidative crosstalk, mediated by these MOS-DEGs, underpins both oncogenic energy reprogramming and mitochondrial resilience to stress.

Oxidative phosphorylation processes; includes diagram, network, and gene ratio analysis chart.
Figure 7. KEGG pathway enrichment analysis of MOS-DEGs. (A) Sankey diagram linking MOS-DEGs to their enriched KEGG pathways, showing overlaps in oxidative phosphorylation, TCA cycle, and metabolic reprogramming. (B-E) Network and dot plots illustrating gene-pathway relationships and enrichment levels across metabolic and oxidative stress-related signaling pathways. Highlighted pathways include the TCA cycle, oxidative phosphorylation, and neurodegeneration-related oxidative damage, confirming mitochondrial oxidative stress regulation as the core functional axis of these genes. Please click here to view a larger version of this figure.

ROC-based clinical validation of MOS in breast cancer

Receiver operating characteristic (ROC) analysis was performed to evaluate the ability of the prioritized mitochondrial oxidative stress-related genes to discriminate between chemotherapy responders and non-responders. As shown in Figure 8A-J, all evaluated genes demonstrated limited discriminative performance, with AUC values ranging approximately from 0.50 to 0.66.

Specifically, PRDX3, MTHFD2, SDHA, and ACLY showed relatively higher but still modest discriminatory ability (AUC ~0.57-0.66), whereas genes such as GSR, GLS2, and several electron transport-related genes exhibited AUC values close to 0.50, indicating near-random classification performance. Among the panel, ACLY demonstrated the highest AUC (~0.65), followed by SDHA (~0.59) and MTHFD2 (~0.58), suggesting only weak separation between responder and non-responder groups.

While significant differences were observed for several genes, the effect sizes were small, and the ROC performance indicates limited predictive accuracy when considered individually. These findings suggest that mitochondrial oxidative stress-related genes alone are insufficient as standalone predictive biomarkers for chemotherapy response. Instead, they may contribute incremental information within a broader multi-gene or multi-omics predictive framework. Consequently, these results should be interpreted as exploratory and hypothesis-generating rather than indicative of immediate clinical utility (Figure 8).

Box plots, ROC curves, and bar charts illustrating gene expression analysis and prediction accuracy.
Figure 8. ROCplotter-based validation of mitochondrial oxidative stress-related master regulators in breast cancer. (A-H) Receiver operating characteristic analyses of the 10 evaluated MOS master regulators: (A) PRDX3, (B) MTHFD2, (C) SDHA, (D) NDUFV1, (E) PDK1, (F) ACLY, (G) GLS2, (H) GSR, (I) LYRM7, and (J) DAP3. Each panel shows the AUC value and corresponding statistical metrics (ROC p-value and Mann-Whitney p.-value) indicating the discriminative ability of each gene to differentiate between chemotherapy responders and non-responders. Elevated AUC values (>0.55) indicate weak to modest predictive associations between gene expression and therapeutic response. The results highlight ACLY and PDK1 as the moderate predictive contributions, followed by PRDX3, MTHFD2, and SDHA, while GSR and GLS2 exhibit moderate predictive contributions to mitochondrial redox homeostasis during chemotherapy adaptation. Please click here to view a larger version of this figure.

Differential expression profiling of top MOS-DEGs

To investigate the transcriptional behavior of MOS-DEGs across breast cancer progression, comparative expression analysis was conducted, which integrates both RNA-Seq and gene chip data derived from The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx), and MET500 datasets. Eight key genes were analyzed across normal, tumor, and metastatic breast tissues to evaluate consistent transcriptional trends and identify those most relevant to tumorigenesis and metastasis.

A stringent selection criterion was applied, defining significant upregulation as fold-change (FC) ≥ 1.3 for Tumor vs Normal (TvsN) and FC ≥ 1.5 for Metastatic vs Normal (MvsN) across both platforms. Genes meeting these thresholds in a consistent direction were considered biologically relevant and prioritized for functional interpretation. As shown in Figure 9A,B, the RNA-Seq-based expression analysis revealed that GSR and MTHFD2 were markedly upregulated in both tumor and metastatic tissues compared to normal controls. GSR demonstrated the most prominent increase (TvsN = 2.73; MvsN = 4.83), while MTHFD2 exhibited substantial and consistent induction (TvsN = 1.88; MvsN = 1.96). These results were validated in the gene chip dataset (Figure 9C,D), where GSR again showed robust expression elevation (TvsN = 1.76; MvsN = 2.19) and MTHFD2 maintained high transcriptional activation (TvsN = 2.03; MvsN = 2.55). The strong and reproducible upregulation of these two genes across independent platforms underscores their central roles in mitochondrial redox regulation-GSR as the primary enzyme maintaining glutathione equilibrium and MTHFD2 as a metabolic catalyst for NADPH production and one-carbon metabolism.

Moderate yet consistent elevation was observed for PRDX3, which fulfilled the RNA-Seq inclusion threshold (TvsN = 1.39; MvsN = 1.58) and showed stable expression in the gene chip dataset (TvsN = 1.02; MvsN = 1.12). Functionally, PRDX3 supports mitochondrial ROS detoxification and complements GSR in sustaining oxidative balance. ACLY demonstrated near-threshold upregulation (RNA-Seq: TvsN = 1.24; MvsN = 1.28; Gene Chip: TvsN = 1.11; MvsN = 0.89), reflecting its metabolic coupling role between oxidative phosphorylation and anabolic lipid biosynthesis. In contrast, PDK1, NDUFV1, and SDHA displayed minimal or inconsistent fold-change variations, suggesting stable mitochondrial baseline activity with limited transcriptional responsiveness to oncogenic stress. GLS2, on the other hand, exhibited a consistent and strong downregulation pattern across both platforms (RNA-Seq: TvsN = 0.32; MvsN = 0.45; Gene Chip: TvsN = 0.66; MvsN = 0.38), confirming its potential role as a metabolic suppressor whose loss may facilitate tumor progression through impaired glutamine-mediated redox homeostasis (Table 2).

Applying the established thresholds allowed the prioritization of four genes GSR, MTHFD2, PRDX3, and ACLY as the most consistent and biologically relevant regulators of mitochondrial oxidative stress adaptation in breast cancer. Among these, GSR and MTHFD2 emerged as the top-tier master regulators, exhibiting high expression reproducibility and functional importance in redox defense and metabolic reprogramming. PRDX3 and ACLY. formed the secondary regulatory layer, bridging antioxidant protection with biosynthetic adaptation. Together, these genes delineate a hierarchical mitochondrial redox-metabolic regulatory axis, essential for sustaining cancer cell survival, oxidative resilience, and metabolic flexibility during tumor progression and metastasis (Figure 9A-D).

Table 2: Cross-platform expression summary and threshold-based prioritization of MOS master regulators in breast cancer. Please click here to download this Table.

Gene expression analysis; boxplot and density plots comparing normal, tumor, metastatic samples.
Figure 9. Comparative expression profiling of MOS-DEGs in breast cancer. (A,B) RNA-Seq-based expression analysis of eight key MOS genes SDHA, PRDX3, PDK1, NDUFV1, MTHFD2, GSR, GLS2, and ACLY across normal (green), tumor (red), and metastatic (gray) breast tissues. Boxplots (A) display the log₂-transformed gene expression levels, while density plots (B) illustrate the probability distributions across sample categories. (C,D) The corresponding gene chip-based expression data, confirming consistent expression trends observed in RNA-Seq analysis. MTHFD2 and GSR exhibit strong tumor-specific upregulation, PRDX3 shows moderate elevation, PDK1 and ACLY reflect metabolic activation, and GLS2. demonstrates consistent suppression, highlighting the coordinated redox-metabolic reprogramming of mitochondrial pathways during breast cancer progression. Please click here to view a larger version of this figure.

Prognostic evaluation of prioritized MOS-DEGs

Because survival analysis was performed in a broader breast cancer cohort without restriction to HER2 status, these prognostic findings should be interpreted as supportive rather than HER2-specific. To evaluate the prognostic significance of the identified MOS-DEGs Kaplan-Meier survival curves were generated to assess the association between MOS gene expression and recurrence-free survival (RFS) in breast cancer patients (Figure 10A-D). The analysis included 4,929 breast cancer patients, assessing recurrence-free survival (RFS) based on the median expression levels of four selected MOS-DEGs: MTHFD2 (201761_at), PRDX3 (209766_at), GSR (225609_at), and ACLY (201128_s_at).

As illustrated in Figure 10A-D, distinct prognostic patterns were observed among these MOS-DEGs. MTHFD2 exhibited a strong negative prognostic association (HR = 1.53; 95% CI: 1.39-1.70; log-rank p = 1.1×10⁻16), where elevated expression levels were significantly correlated with shorter RFS, underscoring its role as an oncogenic metabolic regulator involved in one-carbon metabolism and redox adaptation (Figure 10B). Conversely, PRDX3 (209766_at) demonstrated a significant protective effect (HR = 0.73; 95% CI: 0.66-0.81; log-rank p = 7.7×10⁻10), where higher expression was linked to prolonged RFS, emphasizing its function as a mitochondrial antioxidant enzyme mitigating oxidative stress (Figure 10C). In contrast, GSR (HR = 1.05; p = 0.50) and ACLY (HR = 1.02; p = 0.77.) showed no significant association with recurrence-free survival (Figure 10A,D), suggesting a secondary, supportive role in maintaining redox equilibrium and metabolic homeostasis rather than directly influencing clinical outcomes. Overall, these findings identify MTHFD2 as a key poor-prognostic MOS-DEG driving metabolic proliferation and redox imbalance, while PRDX3 emerges as a protective MOS-DEG contributing to redox stability and a favorable prognosis. Together, these two genes represent opposing molecular forces within the mitochondrial oxidative stress framework-MTHFD2 promoting metabolic aggressiveness and PRDX3 reinforcing antioxidant defense mechanisms in breast cancer (Figure 10A-D).

Kaplan-Meier survival curves for gene expression in cancer analysis; probability vs. time charts.
Figure 10. Kaplan-Meier survival analysis of MOS-DEGs in breast cancer. (A) GSR (225609_at) shows no significant correlation with recurrence-free survival (HR = 1.05; p = 0.50). (B) MTHFD2 (201761_at) demonstrates a strong association with poor prognosis, with high expression predicting reduced RFS (HR = 1.53; p = 1.1×10⁻16). (C) PRDX3 (209766_at) indicates a protective association, where higher expression corresponds to longer RFS (HR = 0.73; p = 7.7×10⁻10). (D) ACLY (201128_s_at) exhibits no significant prognostic effect (HR = 1.02; p = 0.77.). High-expression groups are shown in red, while low-expression groups are shown in black. Cross marks represent censored samples. Please click here to view a larger version of this figure.

nsSNP characterization and prioritization of MTHFD2 and PRDX3

The Ensembl Variant Effect Predictor (VEP) analysis of the canonical transcript MTHFD2-201 (ENST00000394053.7) revealed a total of 40 non-synonymous variants. Among these, five variants-rs1471336772, rs2103841302, rs1428567735, rs1694270227, and rs1694375712 displayed strong evidence of deleterious functional impact across multiple predictive tools. These substitutions were consistently predicted to affect protein structure and function across multiple computational tools, including SIFT ≤ 0.05, PolyPhen-2 ≥ 0.85, CADD values ranging from 29 to 36, MetaLR ranging from 0.73 to 0.86, Mutation Assessor > 3.5, and REVEL scores approaching 0.99. The most deleterious variant, rs1471336772 (T>G), exhibited a CADD score of 33, PolyPhen-2 score of 0.99, and REVEL score of 0.997, suggesting potential structural destabilization and possible impairment of the enzyme's NADPH-dependent catalytic activity (Supplemental Table S4). Most high-scoring variants were located within the NADP⁺-binding and methylenetetrahydrofolate dehydrogenase catalytic domains, which are critical for redox regulation and one-carbon metabolism. These findings suggest that such substitutions could disrupt metabolic flux and redox balance, and may be associated with oncogenic reprogramming in breast cancer.

For PRDX3-201 (ENST00000298510.4), 28 non-synonymous SNPs were identified, of which five variants-rs747786383, rs2133647474, rs761292643, rs377652956, and rs1847844616-demonstrated consistently damaging profiles. These variants met multiple deleterious thresholds: SIFT ≤ 0.05, PolyPhen-2 ≥ 0.85, CADD scores between 28 and 32, MetaLR ranging from 0.43 to 0.99, and REVEL ≥ 0.9. The highest-impact variant, rs747786383 (A>G), presented with CADD = 28, MetaLR = 0.993, and REVEL = 0.97, indicating a strong pathogenic potential (Supplemental Table S5). This mutation occurs within the thioredoxin-dependent peroxidase domain, a region essential for hydrogen peroxide reduction and maintenance of mitochondrial oxidative stress homeostasis. Alterations in this domain may hinder peroxidase catalytic efficiency, compromise antioxidant defenses and increasing susceptibility to oxidative damage. A comparative summary of both genes indicated that MTHFD2 and PRDX3 harbor distinct but complementary functional vulnerabilities in mitochondrial oxidative metabolism. MTHFD2 variants primarily affect enzymatic redox cycling and NADPH regeneration, promoting metabolic aggressiveness, whereas PRDX3 variants compromise peroxidase-mediated detoxification, reducing antioxidant resilience. Among all mutations, rs1471336772 in MTHFD2 and rs747786383 in PRDX3 were identified as the most pathogenic nsSNPs based on their cumulative high scores across CADD, REVEL, and Mutation Assessor predictions. Figure 11 provides a graphical overview of the comparative pathogenicity profiles for the two proteins, highlighting the consistently high scores across SIFT, PolyPhen-2, CADD, MetaLR, REVEL, and Mutation Assessor. Both genes exhibited strong deleterious potential, with MTHFD2 (CADD = 33, REVEL = 0.997) indicating maximal structural disruption and PRDX3 (MetaLR = 0.99, REVEL = 0.97) displaying comparable functional impact on antioxidant domain integrity (Figure 11). Together, these findings reveal that MTHFD2 and PRDX3 harbor highly damaging nsSNPs that could critically impair mitochondrial redox homeostasis and energy metabolism, supporting their potential relevance as oxidative stress-related genes and candidate therapeutic targets in breast cancer pathogenesis.

A systematic re-evaluation of the nsSNP annotation framework was performed to ensure consistency between quantitative pathogenicity scores and categorical classifications across all reported variants. In particular, discrepancies between CADD scores and their qualitative labels were corrected by applying standardized interpretation thresholds (CADD ≥20 as deleterious and ≥30 as highly deleterious), ensuring that all prioritized variants reflect true high-impact substitutions. Furthermore, variant identifiers were harmonized with those reported in the primary results to maintain consistency between the tabulated data and the narrative interpretation. This refinement strengthens the study's analytical rigor and ensures that the reported nsSNPs constitute a coherent, high-confidence set of functionally relevant variants suitable for downstream structural and translational investigations (Table 3).

Functional prediction tools bar chart; CADD highest impact score for MTHFD2, PRDX3 genes.
Figure 11. Comparative nsSNP pathogenicity scores for MTHFD2 and PRDX3. Bar plot showing the aggregated predictive scores of deleteriousness for MTHFD2 (ENST00000394053.7) and PRDX3. (ENST00000298510.4) across six computational tools: SIFT, PolyPhen-2, CADD, MetaLR, REVEL, and Mutation Assessor. Higher scores correspond to increased probability of functional disruption, indicating that both genes possess highly damaging nsSNPs that may compromise mitochondrial redox and metabolic integrity. Please click here to view a larger version of this figure.

Table 3: Top predicted deleterious nsSNPs in MTHFD2 and PRDX3. PolyPhen Class was probably damaging for all nsSNPs. Please click here to download this Table.

Secondary structure and 2D modeling analysis of MTHFD2 and PRDX3 proteins

To further interpret the structural and positional context of deleterious nsSNPs, 2D secondary structure modeling of MTHFD2 (MTDC_HUMAN) and PRDX3 (PRDX3_HUMAN) proteins was performed using PSIPRED and NetSurfP 2.0. These analyses provided detailed insights into the secondary structure elements, residue disorder propensity, and relative solvent accessibility (RSA) of each protein, highlighting the effect of pathogenic mutations on local folding and solvent exposure.

In the MTHFD2 protein (Figure 12A), the 2D model revealed an alternating pattern of α-helices (orange) and β-strands (purple) forming the compact core of the enzyme's catalytic and NADP⁺-binding regions. The deleterious residues identified from nsSNP screening, P208R, A247E, D248G, G291V, and G313D, were predominantly located in or near structured regions (α-helices and β-sheets) with moderate-to-high solvent accessibility. These residues were mapped within well-ordered, low-disorder segments, indicating structural rigidity essential for maintaining enzymatic stability and cofactor binding. Notably, A247E and D248G, positioned within a short α-helical turn adjacent to the active site, displayed high relative surface accessibility (RSA ≥ 0.25), suggesting that these substitutions could directly affect catalytic surface interactions. Similarly, G291V and G313D, located within the β-sheet region near the NADP⁺ pocket, exhibited local distortion potential due to loss of flexibility and increased side-chain volume, as reflected in the thicker disorder lines in these regions. The P208R mutation occurred within a helical-coil transition zone, a region that typically requires precise backbone rigidity, implying that replacement of proline with arginine may introduce local instability and alter folding dynamics. Collectively, these data indicate that the deleterious residues in MTHFD2 reside in structurally constrained, solvent-exposed sites, where even minor substitutions can compromise catalytic geometry and redox coupling.

In the PRDX3 protein (Figure 12B), the secondary structure map exhibited a conserved thioredoxin fold composed of alternating α-helices and β-strands connected by flexible coil loops. The critical mutations P70R, P101R, A218V, and P230S were found in solvent-accessible loop regions or adjacent to secondary structural elements, reflecting their potential influence on redox-active cysteine accessibility and dimer stabilization. The P70R and P101R variants occurred near disordered and solvent-exposed coils, which may disrupt the dynamic flexibility required for peroxidase catalysis and thioredoxin interaction. On the other hand, A218V and P230S mapped within helical loop transitions, regions vital for conformational switching during the catalytic cycle. These residues demonstrated increased exposure and reduced disorder, suggesting that mutations at these positions could rigidify local structural elements, limiting redox conformational adaptability. Overall, the 2D topology indicated that PRDX3 deleterious variants primarily alter loop-to-helix stability, solvent exposure, and local disorder, consistent with the predicted impairment of catalytic and antioxidant functions. The combined analysis of MTHFD2 and PRDX3 2D models revealed that pathogenic nsSNPs cluster in regions balancing rigidity and flexibility, where perturbations in hydrogen bonding or polarity can have amplified effects on protein function. The affected residues occupy structurally conserved, functionally exposed sites, confirming their potential to alter enzymatic conformation and mitochondrial redox regulation.

Protein sequence analysis diagram; secondary structure, helix, strand for MTDC and PRDX3 human FASTA.
Figure 12. Secondary structure and relative surface accessibility of deleterious nsSNP sites. (A) The 2D topology of MTHFD2 (MTDC_HUMAN) showing predicted α-helices (orange), β-strands (purple), and disordered regions (black line thickness). Red vertical boxes mark deleterious residues (P208R, A247E, D248G, G291V, and G313D), which localize within structured and moderately solvent-exposed regions near the catalytic site. (B) The PRDX3 (PRDX3_HUMAN) 2D secondary structure representation highlighting α-helices, β-sheets, and coil regions. Deleterious residues (P70R, P101R, A218V, and P230S) are boxed in red, primarily occupying solvent-accessible loops and helical interfaces critical for redox activity. Relative surface accessibility is indicated in red (exposed) and black (buried), while disorder propensity is depicted by line thickness. Together, the models reveal that disease-associated nsSNPs are positioned in functionally sensitive and structurally constrained domains essential for enzymatic activity and mitochondrial redox balance. Please click here to view a larger version of this figure.

Structural and functional impact of deleterious nsSNPs predicted

To further validate the structural and mechanistic implications of the prioritized nsSNPs identified in MTHFD2 and PRDX3, molecular dynamics-based computational analysis was performed using DynaMut and MutPred2 servers. These tools provided complementary insights into mutation-induced alterations in protein flexibility, stability, solvent accessibility, and catalytic site conformation.

For MTHFD2, five high-confidence substitutions (A247E, D248G, P208R, G291V, and G313D) exhibited MutPred2 scores ≥ 0.90, signifying strong pathogenic potential. Functional motif mapping revealed that these variants localized within or adjacent to PROSITE motifs (PS00006, PS00008, PS00767) and ELM functional regions (ELME000233, ELME000335) associated with NADP⁺ binding and catalytic function. The mutations A247E and D248G, situated near the enzyme's catalytic pocket, demonstrated profound disruption of metal-binding sites (p = 3.0×10⁻4-4.4×10⁻5) and altered ordered interface interactions, which are essential for enzyme-substrate recognition and cofactor stability. These substitutions were predicted to gain new allosteric and catalytic sites (at D248 and I251) while simultaneously losing native catalytic functionality, implying a destabilizing conformational shift. The G291V and G313D variants displayed gain of allosteric sites (F295, G313) and altered transmembrane propensity, indicating possible misfolding and subcellular localization defects. Importantly, both variants were associated with reduced methylation and increased loop flexibility, potentially affecting the enzyme's structural integrity. Similarly, P208R (MutPred2 = 0.916) resulted in altered DNA-binding affinity and gain of a helical structure, suggesting disruption of inter-domain stability and electrostatic coordination within the NADP⁺-binding groove. Collectively, the MTHFD2 nsSNPs converge on mechanisms involving metal-binding perturbation, allosteric regulation gain, and catalytic site disruption, suggesting that they may impair enzymatic activity and redox regulation and metabolic roles critical for cancer cell proliferation and resistance.

For PRDX3, four major substitutions P230S, A218V, P101R, and P70R were predicted to be functionally damaging (MutPred2 ≥ 0.80). These variants mapped within conserved motifs (ELME000053, ELME000085, ELME000146, ELME000137) corresponding to the thioredoxin-dependent peroxidase domain. The P230S mutation (MutPred2 = 0.836) was associated with altered metal binding (p = 1.2 × 10⁻3), gain of disulfide linkage at C229, and loss of an allosteric site at W233, suggesting interference with redox-active cysteine residues crucial for peroxidase catalysis. The A218V variant led to loss of solvent accessibility and gain of an allosteric site at R214, indicative of reduced catalytic exposure and potential redox inactivation. Moreover, P101R and P70R exhibited high MutPred2 scores (0.924-0.945) and introduced structural perturbations in the N-terminal domain. These mutations caused altered transmembrane potential, gain of β-strand conformations, and loss of post-translational modification sites (sulfation and pyrrolidone formation), collectively suggesting impaired folding and redox stability. Overall, PRDX3 variants displayed a consistent pattern of altered metal coordination, increased disorder, and gain/loss of catalytic sites, which could compromise peroxidase efficiency and weaken mitochondrial antioxidant defense mechanisms. These structural perturbations, combined with the deleterious predictions from VEP, CADD, and REVEL analyses, indicate that both MTHFD2 and PRDX3 harbor high-impact nsSNPs that can destabilize protein conformation, disrupt catalytic architecture, and alter molecular interactions essential for maintaining redox equilibrium. The convergence of these damaging mutations on metal-binding and catalytic residues underscores their potential pathogenic role in compromising mitochondrial oxidative stress regulation in breast cancer. It is important to note that these findings are based on computational predictions and require experimental validation to confirm their biological and functional relevance (Table 4).

Table 4: Predicted functional and structural consequences of deleterious nsSNPs in MTHFD2 and PRDX3 based on DynaMut and MutPred2 analyses. Please click here to download this Table.

Structural visualization and atomic-level interpretation of deleterious nsSNPs in MTHFD2 and PRDX3

To elucidate the atomic-scale structural consequences of the most deleterious non-synonymous variants, molecular visualization and 3D structural modeling were performed for MTHFD2 and PRDX3 using PyMOL and DynaMut simulations. These analyses aimed to identify conformational perturbations, residue interaction shifts, and catalytic site rearrangements resulting from high-impact substitutions. For PRDX3, the predicted deleterious mutations P230S, A218V, P101R, and P70R were modeled within the thioredoxin-dependent peroxidase domain. The wild-type protein exhibited a compact α/β-fold characterized by an extensive hydrogen bonding network stabilizing the catalytic cysteine residues. However, the introduction of mutations induced noticeable structural deviations localized near the catalytic and substrate recognition pockets.

Specifically, the P230S substitution (Figure 13A,B) replaced a rigid, nonpolar proline with a polar serine residue. This substitution increased side-chain flexibility, introducing a new hydrogen bonding potential while disrupting the native loop rigidity required for peroxidase domain stability. The result was an increased surface solvent accessibility and partial unfolding near the redox-active site. Similarly, A218V (Figure 13B) substituted a small alanine residue with a bulkier valine, creating steric hindrance in the hydrophobic core and potentially disturbing cofactor accommodation. The P101R and P70R mutations (Figure 13C,D) replaced proline with arginine residues, introducing positively charged side chains in regions originally dominated by nonpolar contacts. These alterations are predicted to introduce electrostatic changes and potential loop distortions near the N-terminal domain, which likely interfere with thioredoxin docking and dimerization stability. Collectively, these structural predictions suggest that PRDX3 variants may exhibit altered redox interface geometry, increased disorder, and reduced peroxidase efficiency, compromising the enzyme's ability to neutralize hydrogen peroxide and maintain mitochondrial oxidative balance.

For MTHFD2, the modeled variants A247E, D248G, P208R, G291V, and G313D were distributed across the NADP-binding and catalytic domains, which are critical for one-carbon metabolism and NADPH regeneration. The wild-type structure revealed a tightly packed catalytic pocket, coordinated by divalent metal ions and hydrogen-bonding interactions, essential for folate substrate binding. In contrast, all modeled substitutions introduced charge, polarity, or size imbalances that destabilized these regions. The A247E mutation (Figure 14B) introduced a negatively charged glutamic acid residue, disrupting metal-binding coordination and electrostatic balance within the catalytic cavity. Similarly, D248G (Figure 14F) resulted in the loss of an aspartic acid carboxyl group, abolishing a key catalytic contact and destabilizing the local secondary structure. The G291V substitution (Figure 14A) introduced a bulkier valine residue, causing steric clashes and partial distortion of the loop adjacent to the NADP pocket.

The G313D variant (Figure 14C) introduced a negatively charged residue in a region typically occupied by a small glycine, resulting in altered hydrogen bonding patterns and reduced flexibility at the catalytic interface. Finally, P208R (Figure 14E) replaced a rigid proline with a positively charged arginine, potentially disturbing interdomain interactions and increasing disorder in the protein's central helical structure. These combined alterations are predicted to significantly reduce enzyme stability, impair metal ion coordination, and decrease catalytic efficiency. Collectively, both MTHFD2 and PRDX3 mutants show localized destabilization within functionally conserved catalytic and redox-active sites. MTHFD2 variants predominantly affect NADPH-binding and catalytic residues, impairing metabolic redox cycling, while PRDX3 mutations interfere with thioredoxin-binding and disulfide formation, weakening mitochondrial antioxidant capacity. These findings underscore that deleterious nsSNPs contribute to the molecular destabilization of mitochondrial redox networks, potentially driving tumor metabolic reprogramming and therapy resistance.

Protein structure analysis; genetic mutations; diagrams; amino acid substitution; molecular biology.
Figure 13. Structural modeling of deleterious nsSNPs in PRDX3 (ENST00000298510.4). (A) Ribbon diagram of the PRDX3 tertiary structure highlighting mutation sites (green) within the thioredoxin-dependent peroxidase domain. (B) Substitution of proline with serine at position 230 and alanine with valine at position 218 induces conformational relaxation and local unfolding in the catalytic loop. (C,D) Replacement of proline with arginine at positions 101 and 70 introduces positive charges and disrupts N-terminal helical packing. Each mutation schematic shows the chemical transition of the affected residues and the altered side-chain orientation within the 3D structure. The predicted effects include increased solvent exposure, hydrogen-bonding rearrangement, and reduced peroxidase stability.Please click here to view a larger version of this figure.

Protein mutagenesis diagram; glycine to valine, proline to arginine, highlighting position changes.
Figure 14. Structural visualization of deleterious nsSNPs in MTHFD2 (ENST00000394053.7). (A) The MTHFD2 monomer structure illustrating high-impact nsSNP positions

Supplemental Table S1. List of genes along with their corresponding normalized expression values and associated sample information derived from the GSE231524 dataset, used for downstream analysis. Please click here to download this File.

Supplemental Table S2. List of genes along with their corresponding normalized expression values and associated sample information derived from the GSE231525 dataset, used for downstream analysis. Please click here to download this File.

Supplemental Table S3. List of differentially expressed genes (DEGs) identified from the 3-MOS-DEGs, including gene names, fold change values, and statistical significance used for downstream interpretation. Please click here to download this File.

Supplemental Table S4. Variants associated with the MTHFD2-201 transcript, including variant type, genomic position, and related annotation information used for downstream analysis. Please click here to download this File.

Supplemental Table S5. Variants associated with the PRDX3-201 transcript, including variant type, genomic position, and related annotation information used for downstream analysis. Please click here to download this File.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Mitochondrial oxidative stress is a defining hallmark of tumor adaptation, governing cellular survival, metabolic plasticity, and therapy resistance. The present study systematically explored mitochondrial oxidative stress–related differentially expressed genes (MOS-DEGs) in HER2+ breast cancer by integrating transcriptomic data, functional enrichment, clinical validation, and detailed in silico mutational and structural analyses. The integration of differential expression with mutation profiling provided a multi-dimensional understanding of how redox-metabolic regulation intersects with genomic instability. Among the identified MOS-DEGs, MTHFD2 and PRDX3 emerged as pivotal mitochondrial determinants displaying distinct yet complementary functions in oxidative stress adaptation, metabolic regulation, and therapeutic response48. The identification of multiple deleterious non-synonymous SNPs (nsSNPs) within these genes further revealed how single amino acid substitutions could modulate enzyme activity, protein folding, and mitochondrial resilience in breast cancer cells. It is important to emphasize that the findings presented in this study are derived from integrative computational and in silico. analyses and, therefore, require further experimental validation to confirm their biological and clinical relevance.

MTHFD2 as a metabolic oncogene and prognostic driver

MTHFD2 (Methylenetetrahydrofolate dehydrogenase 2) encodes a bifunctional mitochondrial enzyme central to the one-carbon metabolism (OCM) pathway. By generating formate and NADPH, MTHFD2 supports nucleotide biosynthesis and redox homeostasis—functions that become hyperactivated in rapidly proliferating tumor cells. While largely silent in normal adult tissues, MTHFD2 is aberrantly re-expressed in over 80% of malignancies, including breast, renal, colorectal, and hepatocellular cancers. Elevated expression strongly correlates with poor clinical outcomes and increased metastatic potential49.

In this study, MTHFD2 was consistently upregulated in both tumor and metastatic tissues across RNA-seq and microarray datasets, with Kaplan–Meier analysis confirming its adverse prognostic association (HR = 1.53, p = 1.1×10⁻16). Functionally, this aligns with its role in NADPH regeneration, essential for replenishing reduced glutathione and thioredoxin molecules that neutralize reactive oxygen species (ROS)50. Such metabolic reprogramming enhances tolerance to oxidative stress and sustains anabolic growth, particularly under therapeutic pressure such as HER2 inhibition51. Although this study primarily focuses on HER2-targeted therapy resistance, particularly in relation to agents such as lapatinib, the ROC analysis was conducted using datasets associated with response to “any chemotherapy” to assess the broader predictive performance of the identified hub genes. This approach was employed as a secondary validation strategy to evaluate their general clinical applicability. However, it is important to note that these results do not exclusively represent HER2-targeted therapy response, and thus, conclusions regarding therapy-specific resistance are interpreted with caution. Future studies incorporating datasets specifically linked to HER2-targeted treatments are required to further validate these findings.

At the genomic level, nsSNP analysis revealed several highly deleterious variants—notably rs1471336772 (G291V), rs2103841302 (G313D), rs1694270227 (P208R), rs1428567735 (A247E), and rs1694375712 (D248G), localized within the NADP⁺-binding and catalytic domains. These mutations consistently showed SIFT ≤ 0.05, PolyPhen-2 ≥ 0.99, and CADD values ranging from 29 to 36, suggesting severe structural perturbation. Secondary and 3D Structural modeling suggested that these residues are located in α-helical or β-strand regions critical for catalytic coordination and cofactor affinity52. DynaMut and MutPred2 analyses further indicated loss of metal-binding potential, altered hydrogen bonding, and reduced protein stability changes that may impair enzymatic activity and cofactor binding.

Interestingly, while such variants are predicted to destabilize enzyme structure, they may paradoxically activate compensatory redox circuits, enabling cancer cells to reroute metabolic flux through parallel pathways such as serine biosynthesis or mitochondrial folate cycling. This mutational adaptability underscores the dynamic interplay between enzyme instability and metabolic resilience in tumor mitochondria53. Collectively, MTHFD2 functions as a metabolic oncogene whose overexpression and pathogenic variants reinforce redox plasticity and drug resistance, making it a compelling therapeutic target for redox-disruptive therapy.

PRDX3 as a mitochondrial antioxidant and protective biomarker

PRDX3 (Peroxiredoxin 3) serves as a thioredoxin-dependent peroxidase responsible for detoxifying hydrogen peroxide within the mitochondrial matrix. It constitutes a vital defense mechanism maintaining the redox balance of oxidative phosphorylation (OXPHOS) and protecting respiratory complexes from oxidative injury. In cancer, PRDX3’s function is dual-faceted: while overexpression supports antioxidant defense and survival under oxidative stress, its loss enhances mitochondrial dysfunction and apoptosis.

Our data revealed that PRDX3 is moderately upregulated in tumors and metastatic tissues and significantly associated with favorable prognosis (HR = 0.73, p = 7.7×10⁻10). This suggests a protective role where enhanced PRDX3 expression buffers mitochondrial ROS accumulation, preserves redox homeostasis, and mitigates oxidative damage during oncogenic stress. Mechanistically, PRDX3 interacts with signaling nodes such as NF-κB and HIF-1α and modulates mitochondrial apoptosis regulators including BCL2 and AIFM1, linking ROS regulation to survival signaling54,55.

Notably, nsSNP profiling identified several deleterious variants including rs747786383 (P230S), rs2133647474 (A218V), rs761292643 (P101R), and rs377652956 (P70R) within the conserved thioredoxin peroxidase domain. These substitutions were predicted to be damaging by SIFT (≤0.05), PolyPhen-2 (≥0.99), and CADD (≥28). Structural modeling and DynaMut simulations demonstrated that these mutations disrupt local hydrogen bonding, increase solvent exposure, and distort the α/β-fold essential for catalytic cysteine accessibility. In particular, P230S and A218V induce loop-to-helix instability and steric hindrance, whereas P70R and P101R introduce electrostatic repulsion and destabilize N-terminal helical packing. These alterations likely compromise peroxidase efficiency and thioredoxin binding, attenuating the antioxidant capacity of PRDX356.

Such variants could diminish mitochondrial defense, promoting ROS accumulation and potentially affecting therapy response in redox-dependent tumor environments. Previous studies have linked deleterious PRDX3 variants to hepatocellular and brain cancers, supporting their broader role in oxidative disease vulnerability. In the context of breast cancer, these findings highlight PRDX3 as both a protective biomarker and a mutation-sensitive antioxidant enzyme, whose dysfunction may predispose cells to oxidative injury and metabolic stress.

Integrative interpretation: The mitochondrial redox–metabolic axis

Collectively, MTHFD2 and PRDX3 define two interconnected arms of the mitochondrial redox–metabolic axis. MTHFD2 supports NADPH production and biosynthetic metabolism, fueling oxidative stress resistance, while PRDX3 detoxifies ROS and safeguards mitochondrial components. Their expression and mutational patterns suggest a feedback mechanism MTHFD2 upregulation regenerates redox cofactors, whereas PRDX3 neutralizes resultant ROS, forming a synchronized defense network that enables sustained proliferation under stress57,58.

The discovery of deleterious nsSNPs within both genes introduces a genetic layer to this regulatory system. Mutations in catalytic or redox-active residues could shift redox equilibrium by altering enzyme stability and cofactor interaction, influencing how cancer cells adapt to therapy-induced oxidative stress. This interplay between transcriptional dysregulation and structural vulnerability emphasizes the need for dual-parameter biomarker assessment—integrating expression profiling with mutational analysis—to refine therapeutic predictions and patient stratification.

Clinical and therapeutic implications

From a clinical perspective, MTHFD2 and PRDX3 represent complementary therapeutic targets within the mitochondrial oxidative stress network. Targeting MTHFD2 with small-molecule inhibitors (e.g., DS18561882, LY345899) has shown efficacy in preclinical cancer models by suppressing NADPH generation, inducing oxidative stress, and synergizing with HER2 inhibitors like lapatinib and trastuzumab. Conversely, modulating PRDX3 expression or activity could be leveraged to fine-tune redox balance either to sensitize tumor cells to ROS-inducing agents or to protect normal tissues from collateral oxidative injury.

To maintain analytical accuracy, we clarified that survival analyses were conducted in an unstratified breast cancer cohort and ROC-based validation reflected general chemotherapy response rather than HER2-targeted therapy.

This dual-targeting approach MTHFD2 inhibition coupled with PRDX3 modulation may represent a potential strategy to disrupt redox-driven resistance mechanisms while maintaining mitochondrial function in non-malignant cells. Moreover, screening for deleterious nsSNPs in these genes may guide genotype-informed therapeutic design, predicting which tumors are most vulnerable to redox perturbation–based interventions.

To ensure consistency and reproducibility of variant interpretation, the prioritized nsSNPs identified in MTHFD2 and PRDX3 were systematically harmonized across all analytical levels, including transcriptomic filtering, pathogenicity scoring, and structural modeling. The final variant sets rs1471336772, rs2103841302, rs1428567735, rs1694270227, and rs1694375712 for MTHFD2, and rs747786383, rs2133647474, rs761292643, and rs377652956 for PRDX3 represent high-confidence deleterious substitutions consistently supported by multiple predictive frameworks. Aligning rsID-based reporting with corresponding amino acid substitutions further ensures interpretative clarity between genomic and structural analyses. This harmonization strengthens the internal validity of the study and facilitates future experimental validation by providing a unified and traceable variant framework.

All functional and mechanistic interpretations in this study are based on computational predictions and should therefore be considered putative. While integrative bioinformatics and structural modeling provide strong evidence for potential effects of identified nsSNPs in MTHFD2 and PRDX3, these findings do not represent experimentally validated mechanisms. Accordingly, all conclusions regarding protein dysfunction, redox disruption, and therapeutic relevance are presented as predictive hypotheses that require further experimental validation.

Limitations and future directions

Despite robust computational and multi-omics integration, several limitations must be acknowledged. The analyses in this study are entirely based on in silico and computational approaches without direct experimental validation. Therefore, the findings should be interpreted as predictive and hypothesis-generating rather than definitive. Functional assays, such as enzymatic activity measurements, mitochondrial redox flux, and structural crystallography, as well as in vivo.models, are necessary to confirm predicted deleterious effects of nsSNPs. Additionally, clinical datasets exhibit inherent heterogeneity in subtype distribution and treatment exposure, which may influence gene expression profiles. Furthermore, the present study is based on a limited number of transcriptomic datasets, which may affect the generalizability of the findings. Although two independent RNA-seq datasets were analyzed and supported by large-scale clinical validation, inclusion of additional independent cohorts would further strengthen the robustness and reproducibility of the results.

Post-translational modifications critical for both MTHFD2 and PRDX3 regulation were not assessed, and dynamic redox interactions in living systems remain to be experimentally verified. Importantly, the key genes identified in this study, particularly MTHFD2 and PRDX3, have been reported in previous studies to play significant roles in cancer metabolism and redox regulation, supporting the biological consistency of our findings across different studies. Future research should employ multi-omics validation (proteomics, metabolomics, and single-cell transcriptomics) and molecular dynamics simulations to capture the full impact of these variants. Establishing causal links between genotype, enzyme function, and therapy response will enhance the translational potential of these findings. Integrating mutational insights with experimental validation may ultimately enable genotype-driven redox therapies, where mitochondrial metabolic inhibitors and redox modulators are co-optimized to overcome resistance and improve clinical outcomes in HER2+ breast cancer.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors have no conflicts of interest to declare. Artificial intelligence tools, including OpenAI’s ChatGPT (GPT-5)., were used to improve the manuscript’s grammar, clarity, and scientific phrasing. All analyses, interpretations, and conclusions were conceived and verified by the authors.

Author contributions:

Xiaobo Jia conceptualized and supervised the study, contributed to study design, data interpretation, manuscript drafting, and provided overall project leadership. Hui Su contributed to data acquisition, bioinformatics analysis, and interpretation of transcriptomic results. Jiaxin Zhang assisted in data processing, nsSNP analysis, structural modeling, and figure preparation. Zhao Liu contributed to statistical analysis, validation of results, and manuscript revision. All authors reviewed and approved the final version of the manuscript.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
AnnotationDbiBioconductorv1.64.0R annotation package used for HGNC gene symbol standardization and annotation.
apeglmBioconductormethod in DESeq2Method used for log2 fold-change shrinkage in differential expression analysis.
BiobaseBioconductorv2.62.0R package used to access and manage GEO-derived expression and metadata objects.
CADDUniversity of Washington / Kircher Labweb toolCombined Annotation Dependent Depletion score used for nsSNP pathogenicity prediction.
clusterProfilerBioconductorv4.8.1R package used for GO and KEGG enrichment analysis.
ComplexHeatmapBioconductorv2.18.0R package used for heatmap generation and expression clustering.
CytoscapeCytoscape ConsortiumsoftwareNetwork visualization platform used with STRING-derived PPI networks and CytoHubba prioritization.
dbSNPNCBIdatabasePopulation variant database used for cross-validation of prioritized nsSNPs.
DESeq2Bioconductorv1.42.0R package used for normalization, variance modeling, and differential expression analysis.
dplyrCRAN / tidyversev1.1.3R package used for data manipulation and intersection of gene sets.
DynaMutUniversity of Melbourne / BioSigweb serverTool used to estimate mutation-associated stability and flexibility changes in proteins.
EnhancedVolcanoBioconductorv1.22.0R package used to visualize differential expression results as volcano plots.
Ensembl Genome BrowserEMBL-EBI / EnsembldatabaseSource of canonical transcript sequences for MTHFD2 and PRDX3.
Ensembl Variant Effect Predictor (VEP)EMBL-EBI / Ensemblweb toolTool used to annotate missense variants and integrate predictive scores.
ExACBroad InstitutedatabasePopulation-level exome database used for nsSNP cross-validation.
Gene Expression Omnibus (GEO)NCBIdatabaseRepository used to retrieve transcriptomic datasets for HER2-positive breast cancer analysis.
Gene OntologyGene Ontology ConsortiumGO:0006979Ontology resource used to retrieve oxidative stress-related genes and for enrichment analysis.
GEOqueryBioconductorv2.70.0R package used to access GEO count matrices and metadata.
ggplot2CRAN / tidyversev3.5.0R package used for data visualization, clustering plots, and graphical outputs.
ggraphCRANR packageR package used for graph and network visualization during enrichment and pathway representation.
gnomADBroad InstitutedatabasePopulation variant database used to check frequency and distribution of prioritized nsSNPs.
GOplotCRANR packageR package used for GO enrichment visualization including chord and summary plots.
GSE231524NCBI GEOaccessionRNA-seq dataset used for analysis of parental, drug-tolerant, and resistant BT474 phenotypes.
GSE231525NCBI GEOaccessionRNA-seq dataset used for analysis of DUSP6 knockdown in HER2-positive breast cancer cells.
Human MitoCarta3.0Broad InstitutedatabaseCurated mitochondrial gene resource used to define mitochondrial gene sets.
Human Oxidative Stress Gene Database (HOSGDB)HOSGDBdatabaseDatabase used to retrieve oxidative stress-related genes.
igraphCRANR packageR package used for network representation and pathway visualization.
iMutant 3.0University of Bologna / Biofoldweb serverTool used to predict mutation effects on protein stability.
Kaplan–Meier PlotterKMplotweb toolOnline platform used for recurrence-free survival analysis in breast cancer cohorts.
KEGGKyoto UniversitydatabasePathway database used for oxidative phosphorylation and pathway enrichment analysis.
KEGGRESTBioconductorR packagePackage used to retrieve and annotate KEGG pathway information.
LapatinibNot specified in manuscript1 μM treatment conditionHER2-targeted inhibitor used in the source experimental datasets to derive drug-tolerant/resistant phenotypes.
MetaLRIntegrated via Ensembl VEPscoreComputational pathogenicity metric used for variant prioritization.
MutPred2MutPredweb serverTool used to predict functional consequences of amino acid substitutions.
NetSurfP 3.0Technical University of Denmarkweb serverTool used for secondary structure and solvent accessibility prediction.
org.Hs.eg.dbBioconductorv3.18.0Human genome annotation package used for gene identifier mapping.
pathviewBioconductorR packagePackage used to map enriched genes onto KEGG pathways.
pheatmapCRANv1.0.12R package used for heatmap plotting and clustering visualization.
PolyPhen-2Harvard / Brigham and Women's Hospitalweb toolComputational predictor used to estimate structural/functional impact of amino acid substitutions.
Protein Data Bank (PDB)RCSB PDBdatabaseSource of experimentally resolved protein structures for MTHFD2 and PRDX3.
PSIPREDUniversity College Londonv3.2Protein secondary structure prediction server used for 2D topology analysis.
PyMOLSchrödinger / PyMOLv3.1Molecular graphics software used to visualize protein structures and mutation sites.
REVELIntegrated via Ensembl VEPscoreEnsemble pathogenicity score used for missense variant prioritization.
ROCplotterROCplot.orgweb toolOnline platform used for ROC-based validation of gene expression in breast cancer response cohorts.
RStudioPositv4.3.1/v4.3.2 environmentStatistical computing environment used for transcriptomic, enrichment, and visualization workflows.
SIFTA*STARweb toolPredictive tool used to classify amino acid substitutions as tolerated or deleterious.
STRINGSTRING ConsortiumdatabaseProtein–protein interaction database used for network analysis and hub gene identification.
TNMplotTNMplot.comweb toolPlatform used to compare gene expression across normal, tumor, and metastatic breast tissues.
VennDiagramCRANv1.7.3R package used to visualize overlap among DEGs and curated mitochondrial gene sets.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Breast CancerHER2 PositiveMitochondrial Oxidative StressTranscriptomic AnalysisHub GenesSingle Nucleotide PolymorphismsDrug TargetsMTHFD2PRDX3RNA Sequencing

Related Articles