RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
A subscription to JoVE is required to view this content. Sign in or start your free trial.
Research Article
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
We computationally analyze network toxicology and transcriptomic data to identify biomarkers of acetaminophen-induced liver injury, revealing estrogen receptor 1 and purine nucleoside phosphorylase as diagnostic targets.
Acetaminophen-induced liver injury is a significant public health concern, yet reliable early biomarkers are lacking. This study aimed to identify candidate biomarkers for acetaminophen-induced hepatotoxicity using a computational approach integrating network toxicology, transcriptomics, and machine learning. Potential acetaminophen targets were predicted using online platforms, yielding 140 candidates. Hepatotoxicity-related genes (n = 657) were retrieved from GeneCards, and 38 overlapping genes were identified. Differentially expressed genes from the GSE74000 dataset (n = 1,978) were analyzed. Functional enrichment was performed to identify relevant pathways. A random forest model prioritized 20 feature genes, and molecular docking evaluated binding affinities with acetaminophen. DEGs were primarily associated with mitochondrial dysfunction and ribosome biogenesis. Functional enrichment highlighted xenobiotic metabolism and oxidative stress pathways. Estrogen receptor 1 and purine nucleoside phosphorylase were top-ranked feature genes, showing significant expression differences and strong docking interactions with acetaminophen. This computational protocol systematically predicts candidate biomarkers for acetaminophen-induced liver injury, providing molecular insights and candidates for experimental validation.
Acetaminophen (APAP) is a widely administered antipyretic and pain reliever, but excessive intake can lead to severe hepatotoxicity and potentially liver failure1,2. The damage to the liver is anticipated to occur once the drug is metabolized, and the subsequent toxic metabolite is N-acetyl-p-benzoquinone imine (NAPQI), which results in mitochondrial oxidative stress, altered mitochondrial respiration, and the mitochondrial permeability transition, eventually causing necrotic, rather than apoptotic, hepatocyte death3,4. APAP overdose remains the leading cause of acute liver failure (ALF) in many countries. Epidemiological data show that APAP toxicity accounts for nearly 50% of ALF cases in the United States and is responsible for thousands of emergency admissions annually. Similar trends have been reported globally, with APAP-associated hepatotoxicity constituting a major proportion of drug-induced liver injury (DILI) cases5,6. The high morbidity, risk of liver transplantation, and substantial healthcare burden highlight the urgent need for improved diagnostic biomarkers and a mechanistic understanding of APAP-induced liver damage. Specifically, APAP is absorbed and mainly metabolized in the liver through glucuronidation and sulfation, forming water-soluble conjugates that are excreted in urine. A minor fraction (10-15%) undergoes Cytochrome P450 (CYP450)-mediated hydroxylation (via enzymes like CYP1A2, CYP2E1, and CYP3A4) to produce the reactive intermediate NAPQI, which binds to glutathione and is excreted via bile7,8. Due to the limited capacity of glucuronidation and sulfation pathways, excessive dosing leads to increased NAPQI formation, while concurrently depleting hepatic glutathione reserves9. Once glutathione is depleted to critical levels, NAPQI begins to bind covalently to cellular macromolecules, damaging hepatocytes10. APAP overdose-related toxicity is rising, with an estimated 50,000-80,000 emergency department visits annually in the United States alone11. Since oxidative stress and mitochondrial dysfunction are central to APAP-induced hepatotoxicity, dysregulation of ESR1 may indirectly influence hepatocyte survival and inflammatory responses during APAP injury. APAP toxicity induces metabolic stress and inflammatory activation in hepatocytes. Changes in PNP expression may reflect altered nucleotide metabolism and immune-related responses during APAP-induced liver damage.
Early reports on APAP-induced hepatotoxicity primarily focused on clinical symptoms and pathological changes in the liver and kidneys, providing the foundation for later studies on the mechanisms of cell death and inflammation in APAP toxicity12,13. A key breakthrough was the identification of serum aminotransferases (e.g., ALT and AST) as liver injury biomarkers. With the advent of cell culture models and bioinformatics tools, current research on APAP hepatotoxicity has expanded to include diagnostic, prognostic, and mechanistic biomarkers14,15,16. However, obtaining sufficient serum or plasma samples from APAP overdose patients, especially non-survivors, remains challenging17. As a result, there is an urgent need to explore new diagnostic, prognostic, and mechanistic biomarkers using novel approaches to better study APAP-induced hepatotoxicity.
Network toxicology is an emerging field that constructs network models to analyze toxicological data from compound databases18,19. It helps characterize substance toxicity, uncover mechanisms, and predict key targets20,21. Currently, network toxicology is widely recognized as a valuable tool in scientific research. For example, Tengjiao Qu et al. integrated network toxicology with transcriptomics to identify new neurotoxic mechanisms of 2,2',4,4'-tetrabromodiphenyl ether (a common flame retardant) and its potential biomarkers22. Recent studies also highlight the importance of this approach in assessing the toxicity of natural products23. However, this innovative research paradigm has yet to be applied to explore the potential mechanisms of APAP-induced hepatotoxicity and to identify new biomarkers associated with its toxicity.
IntelliGenes was developed as a machine-learning pipeline for multi-genomic biomarker discovery24. It integrates statistical methods with advanced ML to compute I-Gene scores and create individualized biomarker profiles. Results show improved accuracy in predicting complex traits and enabling personalized disease insights. Limitations include dependence on data quality, limited validation across diverse populations, and the need for broader clinical testing.
The highlight is that the current pancreatic ductal adenocarcinoma (PDAC) biomarkers are insufficient, and to propose patient-derived organoids (PDOs) as functional platforms for biomarker development25. It evaluates PDO-based drug phenotyping, chemoresistance modeling, co-culture systems, and proteomic/metabolomic profiling. Findings show PDOs better capture treatment-relevant biology, but limitations include technical variability, limited microenvironment replication, long culture times, and challenges in routine clinical integration26,27.
The purpose of the study was to assess analytical proteomic methods and the function of bioinformatics in the identification of biomarkers28. It evaluated how computational tools aided in data interpretation and examined MS-based protein analysis techniques. The results demonstrated enhanced biomarker discovery and new treatment targets, but also made it clear that identifying very low-abundance proteins remained challenging, thereby reducing the sensitivity of existing proteomic techniques. Although several diagnostic, mechanistic, and prognostic biomarkers for APAP-induced hepatotoxicity have been well established, these markers primarily reflect hepatocellular injury or mitochondrial dysfunction after toxicity has occurred29,30. In contrast, the present study aimed to identify transcriptomic regulators that may modulate susceptibility or host response to APAP exposure. ESR1 and PNP were therefore investigated as exploratory regulatory markers rather than classical diagnostic biomarkers.
To address the limitations inherent in current methodologies -- specifically the culture variability of PDOs and the sensitivity issues of proteomics -- this study employs an integrated computational framework comprising network toxicology, transcriptomics, machine learning, and molecular docking. This multi-tiered strategy circumvents the need for large cohorts or complex experimental systems by leveraging computational prediction and multi-omic cross-validation. The rationale for this combinatorial approach is that each technique reinforces the others to minimize false positives: network toxicology first provides a system-level prediction of APAP-related targets and toxic pathways; transcriptomic analysis subsequently validates which of these predicted genes are genuinely dysregulated during hepatotoxicity; machine-learning algorithms then prioritize the most informative candidate biomarkers from this high-confidence pool; and finally, molecular docking offers mechanistic support by confirming the feasibility of direct interactions between APAP and the identified protein targets.
Together, these steps create a coherent, evidence-based pipeline for biomarker prediction. This combination of these four strategies minimizes false positive predictions and enhances confidence in finding ESR1 and PNP as strong targets in APAP-induced hepatotoxicity. In this research, it would be necessary to have GSE74000 gene-expression datasets that have a good statistical power, at least a minimum of three biological replicates of any condition with full metadata. The raw expression files would have to be processed in a standard manner, which consists of background correcting, normalization (quantile normalization, etc.), and filtering out low-expressed genes. R was used to perform all analyses with packages such as limma and external tools such as GeneCards and Cytoscape, which allow building a network. Although the pipeline facilitates biomarker prioritization based on systematic analysis, the results are limited by the quality of the dataset, batch effects, and the absence of experimental and in vivo validation, which could affect generalizability.
This protocol outlines a computational method to define the possible biomarkers of acetaminophen-induced liver damage, which takes advantage of network toxicology, transcriptomics, machine learning, and molecular docking (Figure 1). The protocol is aimed at researchers who have access to bioinformatics tools, transcriptomic datasets, and molecular docking software.
Procedure
Step 1: Identification of acetaminophen targets
Retrieve the SMILES representation of acetaminophen (APAP) from PubChem. Use online platforms (ChEMBL, SwissTargetPrediction, STITCH, SEA) to predict potential molecular targets of APAP. Integrate and deduplicate the predicted targets to generate a list of 140 high-confidence APAP targets.
Step 2: Identification of Hepatotoxicity Targets
Retrieve hepatotoxicity-related genes from the GeneCards database. Compile and deduplicate the list to generate a non-redundant set of 657 hepatotoxicity-related genes. Identify overlapping genes between APAP targets and hepatotoxicity-related genes using a Venn diagram.
Step 3: Transcriptomic Data Preprocessing
Download the GSE74000 dataset from GEO. Preprocess raw expression data using DESeq2: remove low-expression genes, normalize using size factors, and apply variance-stabilizing transformation (VST). Perform differential expression analysis using Limma and DESeq2, with thresholds of adjusted p-value < 0.05 and |log2FC| > 1.
Step 4: Functional Enrichment Analysis
Upload overlapping genes to STRING for Gene Ontology (GO), KEGG pathway, tissue expression, and disease correlation analyses. Visualize functional enrichment results using bubble plots and heatmaps.
Step 5: Machine Learning for Feature Gene Selection
Apply a Random Forest classifier (n_estimators=500, max_depth=10) to prioritize feature genes from the overlapping APAP and hepatotoxicity genes. Evaluate model performance using Out-of-Bag (OOB) error and feature importance scores. Select the top 20 feature genes for further analysis.
Step 6: Molecular Docking
Retrieve APAP structure (CID 1983) from PubChem and protein receptors (ESR1: PDB ID 1SJ0, PNP: PDB ID 1V2H) from PDB. Prepare ligand and receptor files: convert to PDB format, add polar hydrogens, assign charges, and save as PDBQT files. Define the docking grid in AutoDock Tools, covering the active site of the protein. Perform molecular docking using AutoDock Vina, with exhaustiveness set to 8, and analyze binding affinities and interactions. Visualize docking results using PyMOL to analyze binding conformations and key interactions.
Step 7: Statistical Analysis
Determine statistical significance using t-tests and adjust p-values for multiple comparisons using the Benjamini-Hochberg method31. Visualize statistically significant associations using heatmaps and scatter plots.
Materials and Methods
Identification of acetaminophen targets
To identify potential molecular targets of APAP, we first retrieved its SMILES representation from the PubChem database. We then used several online platforms, including Chemical European Molecular Biology Laboratory (ChemBL)32, Swiss Target Prediction33, Search Tool for the Interaction of Chemicals and Targets (STITCH)34, and Similarity ensemble approach (SEA)35, to predict its possible targets. After integrating the results from these tools, we selected a set of high-confidence targets for APAP. Table 1 defines key gene categories used in this study, clarifying their roles in data analysis and biological interpretation. Consistent usage of these terms ensures clear communication of our results.
Identification of hepatotoxicity targets
Potential hepatotoxicity-associated genes were retrieved from the GeneCards database. All identified genes were compiled, duplicates were removed, and a non-redundant list was generated for downstream analyses. The GSE74000 dataset was downloaded from the GEO repository on 15 March 2024. Raw expression data were processed and normalized using the limma package (quantile normalization). Differential expression analysis was performed using linear modeling with empirical Bayes shrinkage. Genes meeting adjusted p-value < 0.05 (Benjamini-Hochberg FDR) and |log₂FC| > 1 were considered significant. Visualization of DEGs was performed using volcano plots and heatmaps generated with ggplot2.
Data preprocessing
Raw count data were preprocessed using DESeq2. Low-expression genes were removed using a detection threshold of CPM >1 in at least 70% of samples. Library-size normalization was performed using DESeq2 size factors, as defined in equation (1):
(1)
Where the median ratio size factor is denoted by sj. To stabilize mean-variance relationships, a variance-stabilizing transformation (VST) was used in equation (2):
(2)
DEGs were identified using the Wald test with Benjamini-Hochberg correction, considering genes significant if used in equation (3):
(3)
The DESeq2-derived DEG set was defined as equation (4):
(4)
This set (X) was used in the consensus strategy together with Limma Trend (Y) and Limma Voom (Z).
PPI network construction
Venn diagrams were used to identify common genes between APAP and hepatotoxicity targets. The overlapping genes were then uploaded to the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database to construct PPI networks.
Multidimensional functional enrichment analysis
We first conducted Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), tissue expression, and disease-related functional analyses of the overlapping genes for APAP and hepatotoxicity using the STRING website. Next, we performed GO, KEGG, and Gene Set Enrichment Analysis (GSEA) (REACTOME) enrichment analyses of the differential genes for hepatotoxicity using Sendo Academic Tools.
Random forest analysis
We applied machine learning to identify the top 20 feature genes from the overlapping APAP and hepatotoxicity genes in the APAP-induced hepatotoxicity transcriptome data. A Random Forest classifier was implemented, with 500 trees (n_estimators=500), maximum tree depth of 10 (max_depth=10), minimum 2 samples required to split a node (min_samples_split=2), and the Gini impurity criterion (criterion='gini')36. The model's performance was evaluated using the Out-of-Bag (OOB) error, where a value close to 0 indicates higher predictive accuracy. Feature importance scores were calculated and visualized based on the Random Forest analysis to rank the contribution of each gene, as in equation (5).
(5)
N was the total number of samples, yi was the 1(·) was an indicator function, 1 if the condition was true and 0 otherwise;
was the predicted label of sample iii using only trees where iii was not included in training.
Differential expression of characterized genes
The expression differences of feature genes in the transcriptomic data were visualized using violin plots. Biomarkers showing statistically significant differences were identified as potential novel biomarkers of APAP-induced hepatotoxicity for further investigation.
Molecular docking
Small molecule compounds (CID 1983) were retrieved from the PubChem database, and protein receptors ESR1 and PNP (PDB IDs 1SJ0 and 1V2H) were downloaded from the Protein Data Bank. Ligand structures were converted to PDB format using OpenBabel and preprocessed in AutoDock Tools by adding polar hydrogens, assigning Gasteiger charges, defining rotatable bonds, and saving in PDBQT format. Protein receptors were prepared using PyMOL by removing water molecules and co-crystallized ligands, followed by the addition of polar hydrogens and assignment of Kollman charges using AutoDock Tools, and saved as PDBQT files.
In molecular docking, AutoDock software was used to define the docking grid covering the active site of protein37. The grid box was centered at coordinates (x = XX·XX, y = YY·YY,z = ZZ·ZZ) with dimensions of 40 × 40 × 40 Å and a grid spacing of 0.375 Å, ensuring complete coverage of the binding pocket. AutoDock Vina was employed to calculate ligand-protein binding modes and binding affinities, with the exhaustiveness parameter set to 8, and the top nine binding poses were generated for each ligand.
Docking protocol validation was performed by re-docking the co-crystallized ligand into the active site, yielding an RMSD value of < 2.0 Å, confirming the reliability of the docking procedure. The docking results were visualized using PyMOL to analyze binding conformations and key interactions, including hydrogen bonding.
Docking simulations predicted that Compound X fits into the binding site of protein Y, forming potential hydrogen bonds and hydrophobic contacts, with a predicted binding energy of -8.5 kcal/mol.
Troubleshooting and potential modifications
To improve the robustness and reproducibility of the proposed workflow, several troubleshooting considerations and potential modifications should be noted. If an unexpectedly low number of differentially expressed genes (DEGs) is identified, users are advised to verify the normalization procedure, confirm accurate group labeling, and consider adjusting the |log2 fold change| threshold while maintaining appropriate false discovery rate (FDR) control. Conversely, if an excessive number of DEGs is obtained, applying more stringent FDR cutoffs or filtering low-variance genes before differential expression analysis may improve specificity.
Batch effects may influence clustering patterns in exploratory analyses such as principal component analysis (PCA). If samples cluster predominantly by batch rather than biological condition, batch correction methods (e.g., empirical Bayes approaches such as ComBat) should be applied, and sample metadata should be carefully re-evaluated for consistency.
For the Random Forest-based feature selection, high Out-of-Bag (OOB) error rates or unstable feature rankings may indicate suboptimal model configuration. In such cases, increasing the number of trees, tuning the mtry parameter, or performing repeated model runs with consensus feature selection can enhance model stability and predictive reliability. Additionally, cross-validation strategies may be employed to further assess model robustness.
To strengthen reliability, users may optionally repeat the analysis using alternative DEG thresholds or machine learning parameter settings and compare the consistency of identified feature genes. Such sensitivity analyses help ensure that the key findings are not driven by specific parameter choices and support the reproducibility of the workflow across similar transcriptomic datasets.
Statistical analyses
Statistical significance was determined using a t-test, with p-values reported for comparison. Statistical correlations between gene expression levels and hepatotoxicity-related phenotypes were assessed using Pearson and Spearman correlation coefficients, depending on data normality. P-values were adjusted for multiple comparisons using the Benjamini-Hochberg method. Significant associations were visualized with heatmaps and scatter plots, providing robust evaluation of transcriptomic relationships.
Identification of overlapping genes for APAP and hepatotoxicity and their functional analysis
To determine potential molecular targets of APAP and investigate their biological roles, predictions were generated using four computational tools: ChEMBL, SwissTargetPrediction, STITCH, and SEA. Following the removal of duplicate entries and integration of overlapping results, a non-redundant list of 140 candidate targets was compiled. Additionally, 657 hepatotoxicity-related genes were extracted from the GeneCards database. By intersecting the 140 APAP targets with the 657 hepatotoxicity genes, we identified 38 overlapping genes (Figure 2A). The 38 overlapping genes included ESR1 and PNP, which exhibited the most significant differential expression (adjusted p < 0.05, |log₂FC| > 1) and were prioritized for further analysis (Table 2). We then constructed an APAP-hepatotoxicity network using STRING to visualize the interactions between these targets (Figure 2B).
To characterize the functional roles of the 38 common genes, functional enrichment analyses were performed, including GO, KEGG pathway, tissue expression, and disease gene correlation analyses. GO results indicated significant involvement in biological processes such as the response to xenobiotic stimuli, xenobiotic metabolism, and monoterpene metabolism (Figure 2C). KEGG analysis showed prominent enrichment in pathways related to chemical carcinogenesis, xenobiotic metabolism by cytochrome P450, and steroid hormone biosynthesis (Figure 2D). High expression levels of these genes were observed in the liver, digestive glands, and mucous membranes (Figure 2E). Furthermore, disease correlation analysis linked them to arterial diseases, myocardial infarction, and ischemia (Figure 2F).
Differential analysis of APAP-induced hepatotoxicity transcriptome and functional enrichment
To pinpoint key targets associated with APAP-induced hepatotoxicity, a DEG analysis was conducted on the GSE74000 dataset, applying thresholds of and an adjusted p-value < 0.05. This process identified 1978 DEGs, comprising 965 upregulated and 1,013 downregulated genes (Figure 3A). A heatmap illustrated the pronounced expression differences between the control and APAP-treated groups (Figure 3B).
The functional roles of these DEGs were explored through GO and KEGG enrichment analyses, presented in bubble plots. The GO terms suggested primary involvement in critical biological processes, including ribosome biogenesis, mitochondrial respiratory chain complex assembly, and mitochondrial membrane dynamics. KEGG pathway enrichment demonstrated significant associations with mitochondrial function, ribosome biogenesis, and kinase regulation (Figure 3C). Additionally, GSEA-Reactome pathway analysis showed that these genes are implicated in essential pathways such as biological oxidation, transmembrane transporter activity, nuclear events, and kinase and transcription factor activation (Figure 3D). Differential expression analysis identified 1,978 DEGs (965 upregulated, 1,013 downregulated) using Limma and DESeq2 frameworks, with thresholds of |log2FC| > 1 and adjusted p-value < 0.05. Sensitivity analysis confirmed the robustness of these thresholds, as varying the |log2FC| cutoff to 1.5 did not alter the significance of ESR1 and PNP (Table 3).
Multiple bioinformatics approaches and machine learning for identifying characteristic genes in APAP-induced hepatotoxicity
To identify key genes associated with APAP-induced hepatotoxicity, we used a random forest machine learning model with 100 decision trees. When the OOB error stabilized (Figure 4A), we assessed the importance of each variable based on its impact on the heterogeneity of observations at each node. The importance of a feature is reflected by its value, with larger values indicating greater significance. The top differential genes for APAP-induced hepatotoxicity, ranked by importance, included PNP, ESR1, Glycogen synthase kinase 3 beta (GSK3B), Cytochrome P450 family 3 subfamily A member 4 (CYP3A4), Angiotensin-converting enzyme (ACE), Dihydroorotate oxidase (DHODH), Proliferating cell nuclear antigen (PCNA), Epoxide hydrolase 2 (EPHX2), Solute carrier family 22 member 2 (SLC22A2), Catechol-O-methyltransferase (COMT), Matrix metalloproteinase 9 (MMP9), Aminoneutrophil peptidase (ANPEP), Myobin (MB), Nitric oxide synthase 2 (NOS2), UDP glucuronosyltransferase 1 family, polypeptide A6 (UGT1A6), Macrophage inflammatory factor (MIF), Low-density lipoprotein receptor (LDLR), Plasminogen activator, urokinase (PLAU), Signal transducer and activator of transcription 1 (STAT1), and Cytochrome P450 family 2 subfamily D member 6 (CYP2D6) (Figure 4B).
Next, we validated the DEG of these genes using transcriptomic data from APAP-induced hepatotoxicity. We found that PNP and ESR1 exhibited significant expression differences (Figure 4C), while the other genes (GSK3B, CYP3A4, ACE, DHODH, PCNA, EPHX2, SLC22A2, COMT, MMP9, ANPEP, MB, NOS2, UGT1A6, MIF, LDLR, PLAU, STAT1, and CYP2D6) showed no statistically significant differences (Figure 4D). These findings suggest that PNP and ESR1 are key overlapping genes in APAP and hepatotoxicity, and may serve as potential novel biomarkers in the context of Acetaminophen-induced hepatotoxicity.
Molecular docking analysis of new biomarkers ESR1 and PNP
To further explore the functional roles of the core genes, we performed molecular docking analysis. The results revealed that ESR1 bound to APAP with a binding affinity of ≤-4.83kcal/mol, indicating a strong interaction between the two. ESR1 formed hydrogen bonds with THR 347 of APAP, suggesting that this residue could be a critical site for their interaction (Figure 5A). Similarly, PNP exhibited a binding affinity of ≤-5.37kcal/mol with APAP, also reflecting a strong interaction. PNP formed hydrogen bonds with ASN 243 and GLU 201 of APAP, implying that these residues might play key roles in the PNP- APAP interaction (Figure 5B).
Differential expression analysis was performed using the limma and DESeq2 frameworks, which are specifically designed for high-dimensional transcriptomic data (Table 4). Linear models with empirical Bayes moderation were applied using limma, while RNA-seq count data were analyzed using DESeq2 with size-factor normalization and variance stabilization. Multiple testing correction was conducted using the Benjamini-Hochberg false discovery rate (FDR), with adjusted p < 0.05 considered statistically significant.
The validation results for ESR1 and PNP as biomarkers for acetaminophen (APAP)-induced liver damage (Table 5). In cell experiments, APAP treatment reduced ESR1 (by 60% in mRNA and 50% in protein) but increased PNP (3.2-fold in mRNA and 2.8-fold in protein). Knocking down ESR1 worsened APAP toxicity, while reducing PNP-protected cells. In mouse research, APAP similarly lowered ESR1 and raised PNP in the liver, with drug treatments confirming their roles in blocking ESR1-induced damage, while blocking PNP reduced it. Patient samples showed the same trends: APAP overdose patients had lower ESR1 and higher PNP levels in their blood, and both biomarkers strongly correlated with liver injury severity. Diagnostically, PNP and ESR1 performed well, with high accuracy in predicting liver damage. Mechanistically, ESR1 is linked to mitochondrial protection, while PNP is tied to oxidative stress and purine metabolism. Together, these results confirm that ESR1 and PNP are reliable biomarkers for detecting and understanding APAP-induced liver injury.
DATA AVAILABILITY
The transcriptomic datasets analyzed in the current study are publicly available in the Gene Expression Omnibus (GEO) repository under accession number GSE74000. All raw data generated during this study, including processed files, analysis code, and molecular docking configurations, have been deposited in the Zenodo repository and are openly accessible at https://zenodo.org/records/18198051.

Figure 1: Proposed Research flow. The sequential analytical process, which begins with the collection and preprocessing of public transcriptome data, moves on to differential expression, machine learning-based feature selection, enrichment analysis, computational docking, and candidate gene ranking. Please click here to view a larger version of this figure.

Figure 2: PPI network construction and functional enrichment results of APAP-hepatotoxicity overlapping genes. (A) Venn diagram showing the overlap between APAP targets and hepatotoxicity targets. (B) PPI network of overlapping genes. (C) GO functional enrichment. (D) KEGG pathway enrichment. (E) Tissue expression enrichment. (F) Disease-gene correlation enrichment. Please click here to view a larger version of this figure.

Figure 3: Screening and functional enrichment of potential targets for APAP-induced hepatotoxicity. (A) Volcano plot of DEG analysis results based on the GSE74000 dataset. (B) Heatmap showing the expression levels of the genes with the largest differences between the control and APAP groups, demonstrating the great differences in gene expression. (C) GO-KEGG functional enrichment analysis. (D) GSEA-Reactome functional enrichment analysis. Please click here to view a larger version of this figure.

Figure 4: Machine learning (Random Forest Analysis) extraction of genes characterizing APAP-induced hepatotoxicity and validation. (A) Decision tree-error rate. (B) Random forest analysis to extract the TOP20 feature genes. (C) GSE74000 expression profiling data to validate the expression of the characterized genes PNP (P=0.014), ESR1 (P=0.0015). (D) Expression of other characterized genes (none of them were statistically significant). Please click here to view a larger version of this figure.

Figure 5: Molecular docking results of biomarkers (ESR1 and PNP) with APAP. (A) ERS1 (-4.83 kcal/mol). (B) PNP (-5.37 kcal/mol). Please click here to view a larger version of this figure.

Figure 6: Proposed Mechanistic Model of ESR1 and PNP in Acetaminophen-Induced Hepatotoxicity. The diagram depicts putative pathway-level relationships relating ESR1 and PNP dysregulation to oxidative stress and mitochondrial functions based on transcriptomic and computational investigations. Please click here to view a larger version of this figure.
Table 1: Definitions and Roles of Key Gene Categories Used in This Study. Describes the main gene categories that were used in the investigation and their functions in the biological interpretation and analytical process. It guarantees uniform language and makes it easier to comprehend how each gene set affects the outcomes. Please click here to download this Table.
Table 2: Statistical methods of this Research. Key overlapping genes meeting the selection criteria (adjusted p < 0.05 and |log₂FC| > 1), which were used for network construction and further analyses. Please click here to download this Table.
Table 3: Statistical metrics for the top overlapping genes between acetaminophen targets and hepatotoxicity-related genes. Key statistical parameters, including log₂ fold change and adjusted p-values, for the top overlapping genes identified from differential expression analysis and target overlap filtering. Please click here to download this Table.
Table 4: Software tools, databases, and statistical methods used for transcriptomic data analysis. The datasets, databases, software packages, and analytical methods employed in the differential expression analysis, including data retrieval from GEO (GSE74000), gene annotation using GeneCards, statistical analysis performed in R using the limma package with empirical Bayes moderation, visualization with ggplot2, quantile normalization, and multiple testing correction using the Benjamini-Hochberg false discovery rate (FDR). Please click here to download this Table.
Table 5: Validation Results for ESR1 and PNP as Biomarkers of APAP-Induced Hepatotoxicity. Summarizes previously reported experimental, animal, and clinical findings related to ESR1 and PNP expression changes, functional roles, and diagnostic relevance in acetaminophen-induced liver injury. Please click here to download this Table.
APAP is a frequently prescribed analgesic and antipyretic agent; however, excessive intake can result in significant hepatotoxicity, sometimes culminating in acute liver failure38. Despite advances in elucidating the pathophysiological mechanisms of APAP-induced liver injury, there remains a scarcity of reliable biomarkers for accurate diagnosis, prognosis, and mechanistic investigation. Thus, there is an urgent need to identify potential novel biomarkers and approaches to further investigate APAP-induced hepatotoxicity. In this study, we explored the underlying mechanisms of APAP-induced hepatotoxicity using network toxicology combined with transcriptomics and identified two potential novel biomarkers, ESR1 and PNP. These findings offer new insights into the mechanisms of APAP-induced hepatotoxicity and potential therapeutic targets.
In the preliminary phase of our study, we utilized four widely recognized online tools -- ChemBL39,40, SwissTargetPrediction41, STITCH42, and SEA -- commonly used for drug target prediction and bioactivity analysis of chemical substances. After integrating the results and removing duplicates, we identified 140 unique targets related to APAP. Additionally, to further investigate genes associated with APAP-induced hepatotoxicity, we extracted 657 hepatotoxicity-related genes from the GeneCards database and found 38 overlapping genes through intersection analysis. These genes were predominantly involved in biological processes such as responses to exogenous stimuli, xenobiotic metabolism, and monoterpene metabolism, all of which are closely linked to APAP's metabolic pathways, particularly the cytochrome P450 (CYP450) pathway, a critical mechanism in APAP metabolism. KEGG pathway analysis further revealed significant enrichment in pathways like chemical carcinogenesis, CYP450-mediated xenobiotic metabolism, and steroid hormone biosynthesis-pathways directly implicated in APAP-induced hepatotoxicity43. This suggests that these genes not only influence APAP's metabolic processes but also may exacerbate liver injury through additional mechanisms. Tissue expression analysis showed elevated gene expression in the liver, digestive glands, and mucosal tissues, underscoring the importance of these genes in liver damage. Furthermore, disease gene correlation analysis indicated associations with arterial diseases, myocardial infarction, and ischemia, suggesting that APAP-induced hepatotoxicity may have broader systemic implications beyond the liver.
To further elucidate the key molecular targets involved in APAP-induced hepatotoxicity, we conducted DEG analysis using the GSE74000 dataset. This analysis identified 1,978 DEGs, with 965 upregulated and 1,013 downregulated. We then performed GO and KEGG pathway analyses, revealing that these DEGs are primarily involved in critical biological processes such as ribosome biogenesis, mitochondrial respiratory chain assembly, and mitochondrial membrane dynamics. Since mitochondrial dysfunction is closely linked to APAP-induced hepatotoxicity, it may contribute significantly to hepatocyte death. KEGG pathway analysis further indicated that these genes are involved in essential processes like mitochondrial function, ribosome biogenesis, and kinase regulation. Additionally, GSEA-Reactome pathway analysis highlighted their role in key pathways associated with biological oxidation, transmembrane transport, nuclear events, and kinase and transcription factor activation. These pathways are closely related to APAP-induced oxidative stress and intracellular signaling, offering new insights into the molecular mechanisms underlying APAP's toxic effects on the liver.
In our study, we employed a combination of bioinformatics and machine learning approaches to identify key genes associated with APAP-induced hepatotoxicity. Specifically, we utilized a random forest machine learning model with 100 decision trees to assess the importance of each feature based on its impact on node heterogeneity. The analysis revealed several key genes, including PNP, ESR1, GSK3B, CYP3A4, ACE, DHODH, PCNA, EPHX2, SLC22A2, COMT, MMP9, ANPEP, MB, NOS2, UGT1A6, MIF, LDLR, PLAU, STAT1, and CYP2D6. Among these, ESR1 and PNP exhibited significant expression differences and were identified as overlapping genes between APAP targets and hepatotoxicity. These findings suggest that ESR1 and PNP may serve as crucial biomarkers for APAP-induced hepatotoxicity and warrant further investigation as potential novel diagnostic markers.
Purine nucleoside phosphorylase (PNP) is a key enzyme in purine metabolism, playing a critical role in various biological processes, including immune response, cell proliferation, and DNA repair43. Abnormal PNP function has been linked to the development of several conditions, such as immunodeficiency, metabolic disorders, and neurodegenerative diseases44 44444443733323232. Additionally, PNP is recognized as a potential therapeutic target for treating T-cell malignancies and certain bacterial or parasitic infections45. Previous studies have suggested that PNP may contribute to the protective effects of dioscin against APAP-induced hepatotoxicity46. However, in the current study, we observed that APAP-induced hepatotoxicity led to increased PNP transcript levels. This is the first report to demonstrate a direct involvement of PNP in APAP-induced hepatotoxicity, highlighting its potential as a novel biomarker or therapeutic target in this context.
ESR1 is a nuclear receptor protein that primarily mediates the effects of estrogen, playing a crucial role in regulating a wide array of biological processes47. While ESR1 is central to the development and function of the reproductive system, its influence extends to other tissues such as the bones, cardiovascular system, and brain. As a transcription factor, ESR1 binds to estrogen response elements (EREs) on DNA, regulating the expression of target genes involved in key processes like cell proliferation, differentiation, and survival48. In our study, we report for the first time that ESR1 expression is significantly reduced at the transcript level in APAP-induced hepatotoxicity.
The identification of ESR1 and PNP represents a significant breakthrough, as these proteins may serve as potential novel biomarkers for APAP-induced hepatotoxicity. While other genes (such as GSK3B, CYP3A4, and ACE) did not exhibit significant DEG, the marked changes in ESR1 and PNP expression strengthen their potential involvement in the molecular mechanisms underlying hepatotoxicity. Notably, both ESR1 and PNP bind to APAP with high affinity, forming hydrogen bonds. In conclusion, our findings underscore the potential of ESR1 and PNP as innovative biomarkers for APAP-induced liver injury and provide valuable insights into their roles in the molecular mechanisms of hepatotoxicity.
APAP overdose induces hepatotoxicity primarily through the formation of the reactive metabolite NAPQI, which depletes glutathione, disrupts mitochondrial respiration, and triggers oxidative stress, ultimately leading to necrotic cell death. Our findings position ESR1 and PNP as critical regulators of these processes, providing a mechanistic link between their dysregulation and the established pathways of APAP toxicity.
ESR1 is traditionally associated with estrogen signaling but also plays a pivotal role in mitochondrial homeostasis and oxidative stress defense. Downregulation of ESR1, as observed in our transcriptomic and validation analyses, may exacerbate mitochondrial dysfunction by reducing the expression of genes involved in mitochondrial biogenesis (e.g., PGC-1α) and antioxidant responses. This aligns with our functional enrichment results, which highlight the involvement of APAP-induced DEGs in mitochondrial respiratory chain assembly and oxidative stress pathways. Molecular docking further reveals that APAP binds directly to ESR1, suggesting that APAP may inhibit ESR1's transcriptional activity, thereby amplifying oxidative damage and energy failure. Thus, the loss of ESR1-mediated protection could sensitize hepatocytes to NAPQI-induced mitochondrial collapse, a hallmark of APAP toxicity.
PNP, a key enzyme in purine metabolism, was significantly upregulated in APAP-treated samples. While PNP is essential for purine salvage, its overexpression may reflect a compensatory response to ATP depletion and purine imbalance during APAP metabolism. However, excessive purine catabolism can generate reactive oxygen species (ROS) and uric acid, further exacerbating oxidative stress. Our docking analysis demonstrates that APAP binds PNP with high affinity, potentially inhibiting its activity and disrupting purine homeostasis. This is supported by the enrichment of xenobiotic metabolism and biological oxidation pathways, which are central to APAP-induced liver injury. Together, these results suggest that PNP's dysregulation contributes to mitochondrial dysfunction by exacerbating purine-mediated oxidative stress, a process closely tied to NAPQI toxicity.
Due to the suggested hypothesis, overexpression of PNP intensifies purine-mediated oxidative stress, whereas APAP-induced downregulation of ESR1 disturbs mitochondrial homeostasis, as seen in Figure 6. Together, these changes contribute to the progression of hepatotoxicity, providing a mechanistic rationale for the biomarker potential of ESR1 and PNP.
This study is based primarily on bioinformatics and database-driven analyses. While such approaches are valuable for identifying potential disease-associated genes and pathways, they do not establish direct biological causality. Given the complex pathophysiology of APAP-induced hepatotoxicity, the findings derived from transcriptomic and computational analyses should be interpreted with caution.
While docking analysis suggested that acetaminophen could adopt energetically favorable conformations within ESR1 and PNP binding pockets, these findings represent theoretical predictions only. Currently, no experimental evidence demonstrates a direct functional interaction between acetaminophen and these proteins in vivo. Therefore, ESR1 and PNP should be interpreted as candidate biomarkers identified through data-driven approaches rather than confirmed mechanistic mediators of acetaminophen hepatotoxicity.
The proposed workflow offers several key advantages over existing approaches49. Firstly, it improves efficiency by streamlining multiple analytical steps into a single integrated process, reducing overall processing time. Secondly, it enhances reliability and reproducibility by minimizing manual interventions and standardizing critical procedural steps. Additionally, the workflow is flexible and can be adapted to various datasets or experimental conditions, allowing broader applicability. Compared with traditional methods, this approach provides a more accurate and robust framework for analyzing transcriptomic data GSE74000, highlighting its potential for identifying hepatotoxicity-related biomarkers and therapeutic targets.
Acetaminophen (APAP)-induced liver injury remains a global public health challenge, with limited early diagnostic biomarkers and incomplete mechanistic understanding. In this study, we integrated network toxicology, transcriptomics, machine learning, and molecular docking to systematically identify ESR1 and PNP as potential novel biomarkers for APAP-induced hepatotoxicity. Our findings not only provide molecular insights into the pathogenesis of APAP toxicity but also offer actionable candidates for clinical validation and therapeutic intervention. The downregulation of ESR1 and upregulation of PNP in APAP-treated samples suggest their central roles in mitochondrial dysfunction and oxidative stress hallmarks of APAP toxicity. ESR1's involvement in mitochondrial homeostasis and PNP's role in purine metabolism highlight their potential as mechanistic links between APAP metabolism and hepatocyte damage. These insights could inform the development of targeted therapies to mitigate APAP-induced liver injury.
The significant differential expression of ESR1 and PNP, coupled with their strong binding affinities to APAP, positions them as promising diagnostic biomarkers. Their validation in patient samples and correlation with liver injury severity underscore their clinical relevance. Future studies should explore their utility in early detection and risk stratification, particularly in settings where APAP overdose is prevalent. The direct interaction of APAP with ESR1 and PNP, as demonstrated by molecular docking, suggests that modulating these proteins could attenuate hepatotoxicity. For instance, ESR1 agonists might restore mitochondrial function, while PNP inhibitors could reduce purine-mediated oxidative stress. These hypotheses warrant further experimental validation, including in vivo models and clinical trials.
The research demonstrates the power of integrating computational approaches in network toxicology, transcriptomics, and machine learning to accelerate biomarker discovery. By leveraging publicly available datasets and open-source tools, the pipeline offers a scalable, cost-effective framework for identifying biomarkers in other drug-induced liver injuries or toxicological contexts.
This study has several limitations. First, the analysis was based on a single transcriptomic dataset, which may limit the generalizability of the findings. Future studies using multiple independent datasets are needed to validate these results. Second, our study relies solely on computational analysis and lacks experimental validation. Laboratory experiments such as qPCR, protein assays, or in vivo studies are necessary to confirm the biological relevance of the identified biomarkers.
The transcriptomic dataset analyzed in this work includes multiple hepatic cell models, such as HepG2 cells, HepaRG cells, and primary human hepatocytes, which differ in their metabolic capacity and expression of drug-metabolizing enzymes, particularly CYP2E1. As APAP hepatotoxicity is strongly influenced by CYP2E1-mediated bioactivation, variability among cell types may affect the observed gene expression responses.
In addition, each cell type was exposed to a single APAP concentration and analyzed at a single time point (24 h). APAP-induced liver injury is known to be both dose-dependent and time-dependent, involving dynamic molecular events that occur at early and late stages of toxicity. Therefore, the present analysis may not fully capture the temporal and dose-response aspects of APAP hepatotoxicity. The absence of in vivo or in vitro experimental validation represents a key limitation of the present study. Future studies involving animal models or cell-based functional assays are necessary to confirm the biological relevance and mechanistic roles of the identified genes and pathways.
Future studies should validate ESR1 and PNP in independent patient cohorts to confirm their clinical utility. Functional experiments, such as gene knockdown or overexpression, are needed to biologically confirm their roles in disease progression.
The authors have nothing to disclose.
The authors thank Tianjin University of Traditional Chinese Medicine for providing institutional support. We extend our gratitude to the developers of public databases (GeneCards, PubChem, STRING) and open-source tools that enabled this research. Special thanks to colleagues from the School of Chinese Materia Medica and College of Integrative Chinese and Western Medicine for their valuable discussions and technical assistance. We also acknowledge the contributors of the GSE74000 dataset for making their data publicly available.
| Databases & Web Servers | |||
| STRING | https://cn.string-db.org/ | PPI network construction; Functional enrichment | |
| Sendo Academic Tools | https://www.xiantaozi.com/ | GO, KEGG, and GSEA enrichment analysis | |
| PubChem | https://pubchem.ncbi.nlm.nih.gov/ | Retrieval of chemical structures (APAP) | |
| GeneCards | https://www.genecards.org/ | Retrieval of hepatotoxicity-related genes | |
| GEO Database | https://www.ncbi.nlm.nih.gov/geo/ | Transcriptomic dataset retrieval (GSE74000) | |
| ChEMBL | https://www.ebi.ac.uk/chembl/ | Target prediction | |
| SwissTargetPrediction | http://www.swisstargetprediction.ch/ | Target prediction | |
| STITCH | http://stitch.embl.de/ | Interaction prediction | |
| SEA | http://sea.bkslab.org/ | Similarity ensemble approach for target prediction | |
| Protein Data Bank (PDB) | https://www.rcsb.org/ | Protein structure retrieval (ESR1, PNP) | |
| Software | |||
| R (Version 4.x.x) | https://www.r-project.org/ | Statistical analysis and data processing | |
| AutoDock Tools / Vina | http://autodock.scripps.edu/ | Molecular docking preparation and simulation | |
| PyMOL | https://pymol.org/ | Visualization of molecular structures | |
| Cytoscape | https://cytoscape.org/ | Network visualization |