RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
A subscription to JoVE is required to view this content. Sign in or start your free trial.
Research Article
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
This protocol provides a streamlined computational pipeline for quantifying nascent enhancer transcripts. By integrating chromatin accessibility, chromatin feature, and transcriptional data, it enables accurate detection and strand-specific analysis of enhancer activity in complex intragenic regions, while remaining accessible to researchers without extensive bioinformatics training.
Core cis-regulatory elements known as enhancers play a central role in enabling precise transcriptional regulation of target genes that control diverse cellular functions and developmental processes. These enhancers are often transcribed in both directions, producing long non-coding transcripts referred to as enhancer RNAs (eRNAs). The expression of eRNAs is closely linked to active chromatin features, such as H3K27ac and co-activator recruitment, and functionally contributes to the transcriptional activation of target genes. Nevertheless, the detection and quantification of eRNAs remain challenging, especially when they overlap with host-gene transcription. To address this, we present a standardized, user-friendly computational workflow for analyzing enhancer transcription from nascent RNA sequencing data. The protocol guides users through data preprocessing, read mapping, and quality control, followed by strand-specific quantification of enhancer-associated transcription, with dedicated procedures for intragenic enhancers where signal assignment is complex. Visualization modules enable clear inspection of enhancer activity across genomic contexts, and built-in options support analyses of both intergenic and intragenic enhancers. Designed for researchers with limited bioinformatics expertise, this workflow provides a practical framework for consistent, reproducible, and scalable studies of enhancer transcription, facilitating broader application of enhancer biology across diverse systems.
Enhancers are cis-regulatory DNA elements that control target gene transcription by organizing chromatin looping and recruiting the transcriptional machinery1,2,3. Their tissue-specific activity enables precise regulation during development and lineage commitment4,5,6,7,8. Active enhancers show characteristic chromatin features such as H3K4me1 (histone H3 Lysine 4 mono-methylation) and H3K27ac (histone H3 Lysine 27 acetylation) and are usually found in DNase I hypersensitive regions that mark open chromatin9,10,11,12. These features enable transcription factors and RNA polymerase II to access the DNA, initiating nascent transcription at enhancer loci13,14,15,16.
This sequential biological process produces enhancer-derived transcripts, called eRNAs, which are bidirectional, noncoding, and typically non-polyadenylated RNAs13,14,15,16. eRNAs serve as markers of enhancer activity and function as effectors in their own right16,17,18,19,20,21,22,23,24. They promote productive elongation by releasing Negative Elongation Factor (NELF) from paused RNA polymerase II16,19, and help stabilize enhancer-promoter loops17,18,20. They also support the formation of transcriptional condensates, potentially via m6A (N6-Methyladenosine) modification21,22,23.
Still, the function of intragenic enhancer transcription, initiated from regulatory elements within gene bodies, remains controversial. Some studies report that eRNAs from intragenic enhancers augment host gene expression25, potentially by promoting NELF release and stimulus-dependent productive elongation26,27. In contrast, other work suggests that this transcription can impede host genes via RNA polymerase II collisions or transcriptional interference, leading to attenuation or premature termination28,29. These conflicting observations, together with the dual role of eRNAs as markers and regulators, highlight the need for careful quantification and functional dissection. Yet, measuring intragenic eRNAs is difficult because they often overlap sense-strand host transcripts13,25,26,30. The challenge is amplified when enhancers reside in regions with nested genes or overlapping transcription on both strands, which obscures enhancer-specific signals.
To overcome these challenges, we developed a bioinformatics pipeline to detect, quantify, and visualize enhancer-associated transcripts, with a particular focus on intragenic regions. The pipeline integrates the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq), Chromatin Immunoprecipitation Sequencing (ChIP-seq), Global Run-on Sequencing (GRO-seq), and genomic annotations to achieve enhancer-level resolution even in complex genomic contexts.
The pipeline comprises four major steps: (i) preprocessing, alignment, peak calling, and signal generation31; (ii) enhancer identification using chromatin features; (iii) strand orientation assignment, particularly within gene bodies; and (iv) quantification and visualization of nascent enhancer transcripts. This framework is particularly useful for systems with high-resolution sequencing data, such as the mouse embryonic stem cells analyzed in this study, and it can be extended to other organisms when suitable datasets are available. By enabling enhancer-specific quantification where existing pipelines fall short, this workflow offers a practical tool for benchmarking and studying intragenic eRNA transcription across diverse genomic contexts.
NOTE: All raw datasets used in workflow are listed in Table 1. Details of bioinformatics tools are provided in the Table of Materials. The number of threads used in this pipeline can be adjusted by modifying the THREADS variable defined at the top of each script. Users may increase the number to accelerate analysis depending on the user's CPU resources.
After each step, a log file is generated. For quick failure checks, use commands like cat StepXX_log.txt; grep -qF "ERROR" StepXX_log.txt && echo "ERROR found. Fix before next step." || echo "OK: no ERROR markers". If any ERROR appears, treat the step as failed and resolve it first.
1. Downloading of full analysis pipeline from GitHub repository
(https://github.com/myunggeunO/Enhancer-transcript-identification-from-read-to-visualization/)
2. Setting up mamba/conda environment for analysis pipeline
3. Download publicly available ChIP-seq, ATAC-seq, and GRO-seq datasets from SRA (Sequence Read Archive)
4. Perform quality control and trimming of raw reads
5. Prepare Bowtie2 reference index
6. Align trimmed reads to mm10 reference genome
7. Merge technical replicates of H3K27ac ChIP-seq data
8. Remove duplicates and non-essential chromosomes
9. Perform peak calling for each dataset
10. Merge biological replicates of ChIP-seq and ATAC-seq BAM files for downstream signal analysis
11. Generate tag directories and signal bigWig files from mapped reads
12. Prepare files for enhancer identification
13. Identify promoter candidates and gene bodies from annotation
14. Process peaks for enhancer identification
15. Identify and classify enhancers
16. Assign temporary strand information to intragenic enhancers
17. Prioritize strand assignment for enhancers overlapping both-strand genes
18. Calculate strand-specific Reads Per Kilobase per Million mapped reads (RPKM) values for genes overlapping PCG-prioritized intragenic enhancers
19. Final strand assignment based on gene expression (RPKM) of overlapping genes
20. Assign strand information to intragenic enhancers and enhancer summits
21. Prepare input files for enhancer validation, eRNA quantification, and visualization
22. Generate aggregation plots for enhancer validation
23. Quantify and visualize enhancer RNA expression
Schematic workflow for enhancer transcript quantification pipeline
Publicly available ChIP-seq (H3K27ac, H3K4me1), ATAC-seq, and GRO-seq datasets (Table 1) were processed with a standardized pipeline designed primarily for validation. Adapter trimming and quality filtering were performed with Trim Galore and Cutadapt, followed by alignment to the mm10 reference genome using Bowtie2 (detailed in protocol Step 6). For ChIP-seq and ATAC-seq, peaks were identified with MACS3, and signal intensity tracks (bigWig files) were generated using HOMER and ucsc-bedgraphtobigwig for downstream visualization and quantitative analysis (Figure 1A). This consistent preprocessing ensures data compatibility, minimizes assay-specific biases, and provides a reliable framework for downstream enhancer quantification.
Enhancer identification was performed based on the following references8,50,51,52. Open chromatin regions derived from called ATAC-seq peaks were then filtered to remove promoter-proximal and ENCODE blacklist regions and intersected with histone modification peaks to define total, active, and non-active enhancers. Based on their genomic context, enhancers were further categorized as intergenic or intragenic. For intragenic enhancers, strand orientation was initially determined by the strand of the overlapping gene (temporal strand) and subsequently refined by selecting the strand of the most highly expressed protein-coding gene (detailed in protocol Steps 16-20). The resulting strand-assigned enhancer sets provide a reproducible and robust basis for downstream analyses, including chromatin accessibility/histone modification profiling, eRNA quantification, and transcriptional signal visualization (Figure 1B), enabling evaluation of whether eRNA quantification is consistent with other data.
Quality control and adapter filtering improve data reliability and inform optimal preprocessing strategies across sequencing platforms
Rigorous QC and adapter trimming were applied to all datasets to ensure high-quality input for downstream analysis. For the GRO-seq data, FastQC analysis revealed the presence of poly(A) and poly(G) tails, which were successfully removed using the --nextseq 20 option in Trim Galore and the -a "A{15}" parameter in Cutadapt (Figure 2A and Figure 3A). In paired-end ATAC-seq datasets, Nextera adapter sequences were detected in both read pairs and successfully trimmed using the --nextera option in Trim Galore (Figure 2B and Figure 3B). In contrast, the ChIP-seq datasets for H3K27ac and H3K4me1 exhibited minimal adapter contamination; therefore, only general quality filtering was applied, as illustrated by the H3K27ac data (Figure 2C and Figure 3C).
Integrated enhancer classification based on chromatin accessibility, histone modifications, and genomic context
After aligning each dataset to the mm10 reference genome (detailed in protocol Step 6), we performed peak calling on ATAC-seq data and removed regions overlapping the mm10 blacklist and promoter-proximal regions (4 kb window from TSS, based on GENCODE M23 annotations). This filtering yielded 71,165 high-confidence accessible chromatin regions (Figure 1B and Figure 4A, green circle). For enhancer annotation25, both enhancers were defined as ATAC-seq peaks overlapping the processed H3K4me1 regions (n = 47,317). Of these, regions also overlapping H3K27ac peaks were further classified as active enhancers (n = 23,147), while the remainder were designated non-active enhancers (n = 24,170) (Figure 1B and Figure 4A, yellow and red circles).
Genomic annotation showed similar intergenic and intragenic distributions across all enhancer categories. Among all enhancers, 45.2% (n = 21,406) were intergenic and 54.8% (n = 25,911) intragenic (Figure 4B, top). Non-active enhancers showed a similar distribution, comprising 45.5% (n = 10,997) intergenic and 54.5% (n = 13,173) intragenic regions. Active enhancers followed a comparable pattern, with 45.0% (n = 10,409) located in intergenic and 55.0% (n = 12,738) in intragenic regions (Figure 4B, bottom). Genomic regions were classified based on overlap with annotated gene bodies in the GENCODE M23 annotation, and only enhancer regions within these gene bodies were retained to ensure accurate intragenic classification (detailed in protocol Step 13).
Chromatin profiling validates functional classification of enhancer sets through activity-specific signatures
To further validate the identified active enhancers, we visualized representative genomic loci using the UCSC Genome Browser by uploading the generated bigWig signal tracks and enhancer BED files. An intergenic enhancer upstream of the Nanog gene53 (-45 kb and -5 kb from Nanog TSS) exhibited strong enrichment for ATAC-seq, H3K4me1, and H3K27ac signals, alongside clear transcriptional activity detected by GRO-seq (Figure 5A). Likewise, an intragenic active enhancer within the Chd2 gene29, near its TSS, displayed similar chromatin and transcriptional features (Figure 5B).
To further validate the chromatin characteristics of the classified enhancer sets, we profiled ATAC-seq and histone modification signals across both intergenic and intragenic enhancers, stratified by activity status (detailed in protocol Step 22). Aggregation plots centered on ATAC-seq peak summits revealed distinct enrichment patterns: active enhancers exhibited strong, coordinated enrichment of ATAC-seq, H3K4me1, and H3K27ac signals, while non-active enhancers showed robust ATAC-seq and H3K4me1 signals but markedly reduced H3K27ac enrichment (Figure 5C).
Strand-aware quantification of enhancer RNA transcription enables functional classification of enhancer activity states
To quantify transcriptional differences between active and non-active enhancers, nascent reads were strand-assigned and normalized for each enhancer, followed by visualization using violin plots (Figure 6A). Mapped GRO-seq reads were detected per enhancer using a strand-aware approach. For intergenic enhancers, reads from both strands were included due to the absence of a defined transcriptional orientation. In contrast, for intragenic enhancers, only reads mapped to the antisense strand relative to the overlapping gene were considered, based on a stepwise strand-assignment method (detailed in Steps 16-20 and 23). Active enhancers exhibited significantly higher transcriptional output than non-active enhancers (Wilcoxon rank-sum tests, p < 2.2 × 10-16) (Figure 6B). These results demonstrate the pipeline's reliability in quantifying enhancer transcription and its utility in classifying enhancer activity states.

Figure 1: Stepwise computational framework for identifying enhancers and quantifying eRNA transcripts. (A) Overall workflow summarizing preprocessing of public sequencing datasets. (B) Downstream analyses including enhancer identification from chromatin accessibility and histone modifications, strand assignment of intragenic enhancers, and quantification/visualization of enhancer transcription. Please click here to view a larger version of this figure.

Figure 2: Quality assessment of raw and preprocessed sequencing reads from public datasets. Quality assessment of (A) GRO-seq, (B) ATAC-seq, and (C) H3K27ac ChIP-seq data shows substantial improvements in read quality after adapter and artifact removal. Please click here to view a larger version of this figure.

Figure 3: Evaluation of adapter contamination before and after trimming across datasets. Across (A) GRO-seq, (B) ATAC-seq, and (C) H3K27ac ChIP-seq datasets, trimming effectively removed adapter sequences and artifacts, resulting in decreased contamination levels and improved data quality. Please click here to view a larger version of this figure.

Figure 4: Classification of enhancers by activity status and genomic context. (A) Active and non-active enhancers defined by chromatin accessibility and histone modification overlap. (B) Genomic context of enhancer sets showing comparable proportions of intergenic and intragenic enhancers across activity states. Please click here to view a larger version of this figure.

Figure 5: Representative genomic loci and chromatin feature profiles of enhancer sets. (A, B) Representative examples of intergenic (Nanog) and intragenic (Chd2) active enhancers with characteristic chromatin and transcriptional signatures. (C) Aggregated chromatin profiles confirm distinct signal enrichment patterns between active and non-active enhancers. Please click here to view a larger version of this figure.

Figure 6: Quantitative analysis of enhancer transcriptional activity by region and activation state. (A) Distributions of nascent transcript levels highlight stronger transcription at active enhancers compared to non-active enhancers across genomic contexts. (B) Statistical comparisons confirm significant transcriptional differences between enhancer groups. Significance levels: p < 0.05 (*), p < 0.01 (**), p < 0.001 (***) Please click here to view a larger version of this figure.
| Name of Sample | DOI | Accession Number | Comments/Description |
| ATAC-seq in E14Tg2a | doi: https://doi.org/10.1038/s44318-024-00086-5 | GSM7072960, GSM7072961 | Serum/LIF condition, 129/Ola derived |
| GRO-seq in E14Tg2a | doi: https://doi.org/10.1016/j.scr.2017.11.012 | GSM2651156 | Serum/LIF condition, 129/Ola derived |
| H3K27ac ChIP-seq in E14 | doi: https://doi.org/10.1186/s13059-019-1860-7 | GSM3399478 | Serum/LIF condition, 129/Ola derived |
| H3K4me1 ChIP-seq in E14Tg2a | doi: https://doi.org/10.17989/ENCSR032JUI | GSM4050822, GSM4050823 | Serum/LIF condition, 129/Ola derived |
| Input of H3K27ac ChIP-seq in E14 | doi: https://doi.org/10.1186/s13059-019-1860-7 | GSM3399484 | Serum/LIF condition, 129/Ola derived |
| Input of H3K4me1 ChIP-seq in E14Tg2a | doi: https://doi.org/10.17989/ENCSR326ULS | GSM4051038, GSM4051039 | Serum/LIF condition, 129/Ola derived |
Table 1: All raw datasets used in the workflow. Publicly available ChIP-seq (H3K27ac, H3K4me1), ATAC-seq, and GRO-seq datasets are listed.
Following the discovery of enhancer-derived transcripts13,14,15, accurately quantifying eRNAs has remained a major challenge, particularly in intragenic contexts where eRNAs often overlap with host gene transcripts. This overlap complicates strand assignment and signal attribution, making it difficult to distinguish genuine enhancer transcription from background gene expression13,25,26,30. Although eRNAs are increasingly recognized as functional indicators of enhancer activity, the field has lacked a standardized, accessible framework for enhancer transcript quantification. To address this need, we developed a modular and strand-aware pipeline that integrates chromatin accessibility and histone modification data for enhancer identification and leverages nascent RNA sequencing to quantify enhancer-derived transcription with high precision. Designed with usability in mind, the pipeline allows researchers without extensive computational training to perform reproducible analyses of enhancer activity, from raw sequencing data to functional interpretation.
Relative to methods that focus on candidate identification or are tied to specific assay modalities, this pipeline differs by providing an end-to-end, within-condition framework. It (i) performs context-aware enhancer calling and classification prior to quantification, applying antisense-focused counting at intragenic loci to reduce misattribution from host-gene transcription, and (ii) integrates chromatin evidence (ATAC with H3K27ac/H3K4me1) and promoter exclusion upstream of nascent-RNA quantification to improve specificity and detection accuracy. The resulting strand-aware, locus-level outputs support transparent, within-condition ranking and interpretation, while remaining compatible with downstream comparative analyses.
To ensure reproducibility and ease of use, the pipeline begins by creating a dedicated mamba/conda environment that contains all required tools. By default, the scripts consolidate Python/CLI utilities and R packages into a single environment for convenience, minimizing setup friction and path conflicts. In rare cases where dependency conflicts arise (e.g., BLAS/LAPACK or C/C++ runtime/ABI mismatches), we recommend splitting environments by keeping all analysis tools in enhancer-env and running the final visualization script (Step24_visualization_of_enhancer_transcript.R) in a lightweight R-only environment. With the environment in place, the workflow proceeds with quality control and adapter trimming, which are essential for ensuring accurate downstream analysis. Using Trim Galore and Cutadapt, we removed adapter sequences and low-quality bases and assessed the resulting improvements with FastQC (Figure 2 and Figure 3). Sequencing reads were aligned to the mm10 reference genome using Bowtie2 with sequencing-specific parameters. Given the lack of input controls for ATAC-seq and GRO-seq, we applied a stringent alignment strategy (Bowtie2's --very-sensitive mode and MAPQ ≥ 30 filtering) to reduce noise from ambiguous or low-confidence reads. This approach helps mitigate technical artifacts such as transposase bias and repetitive region mapping, enhancing signal reliability. To further refine ATAC-seq data, mitochondrial reads (chrM) were removed, and MACS3 peak calling was performed with parameters designed to minimize transposase-associated bias (detailed in protocol Step 9). These preprocessing steps establish a robust foundation for enhancer identification.
Enhancer regions were identified by intersecting ATAC-seq peaks with H3K4me1 and H3K27ac signals, following the removal of blacklist regions and promoter-proximal windows (4 kb window from GENCODE M23-annotated TSSs). To improve specificity, histone mark peaks were slopped by 1 kb on both sides and filtered to exclude promoter-like regions, thereby minimizing false-positive enhancer calls (Figure 1B and protocol Step 14). Identified enhancers were then classified as intergenic or intragenic based on the position of the ATAC-seq summit. This classification not only reflects the genomic context in which enhancer activity occurs but also provides the structural basis for downstream strand assignment and eRNA quantification. Context-aware categorization is particularly critical for analyzing intragenic enhancers, which overlap with host gene bodies and require careful resolution of transcript origin. Altogether, this strategy enables robust identification of active enhancer elements and lays a foundation for precise assessment of their transcriptional output across distinct genomic compartments.
To enable strand-specific quantification of eRNAs from intragenic enhancers, we implemented a three-step strand assignment strategy: (i) initial assignment based on the strand of overlapping gene features, (ii) prioritization of protein-coding genes, and (iii) refinement using RPKM values to identify the most highly expressed gene. This stepwise approach improves the accuracy of host gene assignment and enables reliable quantification of antisense-derived eRNAs, effectively addressing the challenge of transcript overlap with host gene expression. Given that nascent transcription data reflect relative gene expression within a sample, RPKM normalization is appropriate for this purpose, although TPM (Transcripts Per Million) normalization may also be suitable depending on the analytical framework54. In the final quantification step, we used only antisense reads to evaluate intragenic enhancer activity, explicitly excluding sense-strand reads to avoid confounding by overlapping transcripts. This approach revealed significant differences in eRNA expression between active and non-active enhancers in both intergenic and intragenic regions.
When adapting this workflow to other datasets or reference assemblies, several practical considerations warrant attention. This pipeline was originally designed for mm10 with GENCODE M23 annotation, but newer assemblies, such as mm39/GRCm39 with GENCODE M38, can be substituted as long as all associated resources are replaced consistently, including the Bowtie2 genome index, annotation (GTF), blacklist, and chrom.sizes files. Version mismatches between the reference genome and annotation, or inconsistent chromosome naming conventions (e.g., UCSC chr1/chrM vs. Ensembl 1/MT), may result in coordinate errors and spuriously empty overlaps in bedtools analyses. To prevent such issues, it is essential to ensure that every reference file originates from the same assembly and follows a uniform naming scheme. In addition, nascent RNA library orientation differs by protocol; for example, PRO-seq frequently generates reverse-stranded libraries. In these cases, featureCounts -s should be configured to reflect the correct library type, and strandness should be confirmed by a brief QC check before proceeding with final quantification.
There are still unresolved limitations. Some intragenic enhancers reside within gene bodies that overlap on both strands, making it difficult to disentangle genuine enhancer transcription from overlapping gene activity, even when focusing on antisense signals. In such cases, measured transcripts may reflect background transcription rather than bona fide enhancer activity. Additionally, strand-specific quantification of sense-derived eRNAs remains technically challenging, precluding accurate direct comparison between intergenic and intragenic enhancer activity. While subtracting transcriptional signals from non-enhancer regions within the same gene may help isolate sense transcription from intragenic enhancers, this approach is limited by sequencing noise and biological variability, such as polymerase collision and transcriptional interference. These limitations highlight the need for further methodological development to improve the resolution of enhancer-derived transcript quantification, particularly in complex intragenic contexts.
The pipeline is implemented as a modular suite of shell scripts designed for accessibility, efficiency, and reproducibility. Each step executes as a single documented command with sensible defaults, while logs capture versions, parameters, and input checksums to ensure consistent execution across machines and time. The modular design facilitates adaptation by allowing users to substitute references, adjust trimming/alignment parameters, or integrate additional assays (e.g., ChIP-seq variants or nascent RNA protocols) without disrupting downstream steps. These features underscore the practical importance of the workflow and provide a foundation for future applications.
The workflow yields strand-aware, annotation-linked enhancer candidates and normalized locus-level outputs that are directly usable for ranking, filtering, and comparative analyses. Such outputs map naturally to experimental follow-up, including reporter assays to test regulatory activity and targeted enhancer perturbation to validate functional impact. For routine use, the combination of stepwise documentation, deterministic preprocessing, and clear handoffs lowers the barrier for researchers with limited computational expertise while remaining flexible for advanced users. Together, these qualities position the pipeline as a practical screening and target-designation framework, particularly suited for experimental biology laboratories.
The authors have no conflicts of interest to disclose.
This study was supported by the research fund of Chungnam National University [2022-0582-01 (S.-K.K.) and 2023-0545-01 (S.-K.K.)], South Korea. Figure 1 was created using BioRender (https://biorender.com/).
| bedtools | Quinlan Lab, University of Utah | v2.31.1 | Utilities for editing BED files |
| bowtie2 | Langmead Lab, Johns Hopkins University | v2.5.4 | Multi-threaded aligner for mapping reads to a reference genome |
| cowplot | Wilke Lab, University of Texas | v1.2.0 | Tools for combining and aligning ggplot2-based figures |
| cutadapt | Science For Life Laboratory, Stockholm University | v5.1 | Adaptor and poly-A/G tail trimmer |
| deeptools | Bioinformatics Facility, Max Planck Institute | v3.5.6 | Read counting tool for quantifying reads in defined genomic regions |
| fastqc | Babraham Bioinformatics, Babraham Institute | v0.12.1 | Quality control for sequencing reads |
| featureCounts (subread) | Shi Lab, Monash University | v2.1.1 | Raw read counting tools for specified genomic regions |
| homer | Benner Lab, University of California San Diego (UCSD) | v5.1 | Toolkit for ChIP-seq, ATAC-seq, and nascent RNA analysis; includes tag directory creation and signal profiling |
| macs3 | Chan Zuckerberg Initiative | v3.0.3 | Peak calling for ChIP-seq and ATAC-seq datasets |
| pigz | . | v2.8 | Multi-threaded compression tool for generating gzip-compressed files |
| sambamba | Petersburg State University | v1.0.1 | Multi-threaded SAM/BAM file processing toolkit |
| samtools | Wellcome Trust Sanger Institute | v1.22.1 | Tools for processing and manipulating SAM/BAM files |
| sra-tools | National Center for Biotechnology Information (NCBI) | v3.2.0 | For downloading SRR files from the NCBI SRA database |
| tidyverse | Posit PBC | v2.0.0 | Collection of R packages for data manipulation and visualization |
| trim-galore | Altos Labs, Cambridge Institute of Science | v0.6.10 | Adapter and low-quality base trimming using multi-threading |
| Ubuntu 20.04 | Developing and testing the pipeline | ||
| ucsc-bedgraphtobigwig | Kent Lab, University of California Santa Cruze | v482 | Tools for generating bigWig signal tracks |