Method Article

Transcriptomic Analysis Based on Bulk RNA-seq Data

DOI:

10.3791/69611

January 16th, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The present protocol establishes a complete pipeline for analyzing the process of bulk RNA-seq from raw data to functional enrichment analysis.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Nonalcoholic fatty liver (NAFL) is usually considered a benign condition; however, once it progresses to non-alcoholic steatohepatitis (NASH), patients face a significantly elevated risk of developing end-stage liver disease. Many studies are attempting to elucidate the molecular mechanism underlying the transition from NAFL to NASH. High-throughput sequencing technologies (such as bulk RNA-seq) have provided researchers with a deeper understanding by examining the transcriptome, revealing the expression of molecules, activation of signaling pathways, and other factors associated with disease progression. There is a wealth of open-source data available for researchers to analyze in order to identify potential targets for disease treatment. However, related research is limited by the lack of an efficient and reliable process for upstream analysis of the transcriptome. Here, a highly reproducible and user-friendly upstream analysis and subsequent related differential gene analysis pipeline is provided to achieve standardized processing and deep parsing of private or public data. The pipeline is divided into four steps: (1) quality control of data; (2) gene mapping; (3) differential gene analysis; and (4) functional analysis. This process aims to uncover the molecular mechanisms of disease transformation and assist researchers in screening potential drug targets and therapeutic approaches through the analysis of Bulk RNA-seq data.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Non-alcoholic fatty liver disease (NAFLD) is the most prevalent chronic liver disease globally, affecting more than a quarter of the population. Its incidence has increased dramatically in recent decades1,2,3. The growing disease burden, especially its more advanced form, non-alcoholic steatohepatitis (NASH), poses a major global health challenge and a heavy economic burden4. The first stage of NAFLD is non-alcoholic fatty liver (NAFL), which is accompanied by inflammation and fibrosis that can progress to NASH. The latter significantly increases the r....

Access restricted. Please log in or start a trial to view this content.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

For demonstration purposes, the publicly available dataset PRJNA1023502 generated by Lan Bai et al. was used to illustrate each step of both upstream and downstream analyses20. As this dataset originates from the open-access NCBI SRA database, no additional permissions or ethical approvals are required. See the Table of Materials to verify all required software and R-package versions. The publicly available dataset PRJNA1023502 comprises 6 not-NASH, 6 NAFL, and 6 NASH liver RNA-seq samples. In this protocol, the dataset was used to demonstrate all steps of the bulk RNA-seq workflow, including data retrieval from the SRA databas....

Access restricted. Please log in or start a trial to view this content.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The upstream analysis workflow for bulk RNA-seq is illustrated in Figure 1A. This workflow sequentially executes the following key steps on a Linux platform: first, rigorous quality control of raw sequencing data is performed using fastp to remove low-quality reads and adapter sequences; subsequently, HISAT2 aligns high-quality reads to the reference genome, with Samtools converting and sorting the alignment files; finally, FeatureCounts performs gene-level quantification to generate a gene .......

Access restricted. Please log in or start a trial to view this content.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Bulk RNA-seq data analysis is characterized as an interdisciplinary task that integrates genomics, bioinformatics, statistics, and computer science. A complete analytical workflow encompasses multiple upstream and downstream steps, including raw data preprocessing, quality control, sequence alignment, gene-level quantification, data normalization, differential expression analysis, and biological interpretation. Among these steps, accurately converting raw sequencing reads into a high-quality gene expression matrix is par.......

Access restricted. Please log in or start a trial to view this content.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors declare that they have no conflicts of interest.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors would like to thank the maintainers of the publicly available databases used in this study.

....

Access restricted. Please log in or start a trial to view this content.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
biomaRtBioconductor2.64.0Gene annotation from Ensembl
clusterProfilerBioconductor4.16.0Functional enrichment analysis
DESeq2Bioconductor1.48.1Differential expression analysis
FactoMineRAgroParisTech2.11.0PCA and multivariate analysis
fastpOpenGene1.0.1Quality control and filtering of FASTQ data
FeatureCountsBioinformatics Division, The Walter and Eliza Hall Institute of Medical Research2.0.0 Count the number of reads mapped to each gene for gene expression quantification
ggplot2Posit3.5.2Data visualization
ggrepelKamil Slowikowski0.9.6Non-overlapping text labels
ggridgesClaus O. Wilke0.5.6Create ridgeline plots
HISAT2Johns Hopkins University2.2.1Align the filtered high-quality reads to the reference genome
RR Core Team 4.5.0An environment for data computation, analysis, and visualization
RColorBrewerErich Neuwirth1.1.3Color palettes for plotting
samtoolsLarge Scale Genomics work stream1.22.0Convert and process SAM files for efficient retrieval and access
SRA ToolkitNational Center for Biotechnology Information3.2.1Obtain and preprocess raw sequencing data from the NCBI SRA database

References

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,
  1. Asrani, S. K., Devarbhavi, H., Eaton, J., Kamath, P. S. Burden of liver diseases in the world. J Hepatol. 70 (1), 151-171 (2019).
  2. Friedman, S. L., Neuschwander-Tetri, B. A., Rinella, M., Sanyal, A. J. Mechanisms of NAFLD development and therapeutic strategies. Nat Med. 24 (7), 908-922 (2018).
  3. <....

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Bulk RNA SeqTranscriptomic AnalysisDifferential Gene AnalysisFunctional AnalysisQuality ControlGene MappingNonalcoholic Fatty LiverSteatohepatitis ProgressionMolecular MechanismsDisease Biomarkers

Related Articles