High-precision Detection of RNA Editing Sites using Calibrated Differential RNA Editing Scanner

Jun Sun; Ziwei Du; Chi Zhang

doi:10.3791/71148

Method Article

High-precision Detection of RNA Editing Sites using Calibrated Differential RNA Editing Scanner

DOI:

10.3791/71148

⸱

June 23rd, 2026

Jun Sun¹^,² , Ziwei Du¹^,² , Chi Zhang¹^,²

¹Shanghai Institute of Biological Products, ²State Key Laboratory of Novel Vaccines for Emerging Infectious Diseases, China National Biotec Group Company Limited

Summary

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This protocol describes the use of the Calibrated Differential RNA Editing Scanner (CADRES), a computational workflow that integrates DNA–RNA joint variant calling, signal‑optimized recalibration, and replicate‑aware statistical modeling to identify differential RNA editing sites with high precision.

Abstract

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Accurate delineation of RNA editing remains technically challenging because genuine post‑transcriptional alterations must be distinguished from genomic variants and sequencing artifacts. This difficulty is especially marked for cytidine‑to‑uridine editing catalyzed by APOBEC enzymes, where intermixed DNA and RNA changes obscure the true editing signal. The Calibrated Differential RNA Editing Scanner (CADRES) provides a structured computational framework to address these limitations through integrated DNA–RNA variant interrogation and targeted preservation of authentic editing signatures. This protocol presents the CADRES workflow, including data preparation, joint RNA‑variant calling, signal-preserving base-quality recalibration, artifact filtering, and differential assessment of RNA editing between experimental conditions. CADRES supports paired RNA‑seq and whole‑genome or whole‑exome sequencing with biological replication. A multi-stage filtering strategy, including homopolymer removal and PBLAT-based paralogue screening, systematically reduces false positives while preserving low-frequency editing events. By combining calibration with replicate-aware modeling, CADRES increases the precision and reproducibility of RNA editing analysis, enabling interrogation of editing dynamics across diverse biological contexts. Compared with established methods, CADRES is designed to improve precision in RNA editing detection, particularly for APOBEC-mediated C-to-U events.

Introduction

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

RNA editing constitutes a dynamic layer of post‑transcriptional regulation that enables site‑specific nucleotide substitutions within RNA transcripts without altering the underlying DNA sequence. In metazoans, adenosine‑to‑inosine (A>I) deamination mediated by ADAR enzymes is the predominant form and contributes to transcript diversification, mRNA stability, innate immune modulation, and neuronal function¹^,². Cytidine‑to‑uridine (C>U) (hereafter "C>U" in biological context; "C>T" in sequencing context) deamination, catalyzed by members of the APOBEC family, operates alongside these pathways and is implicated in lipid metabolism, viral restriction, mutagenesis, and emerging regulatory roles in immune and cancer biology³^,⁴^,⁵^,⁶^,⁷. Recent work has demonstrated that several APOBEC enzymes, including APOBEC1, APOBEC3A, and APOBEC3B (A3B), catalyze RNA editing across physiological and pathological contexts³^,⁴^,⁷^,⁸^,⁹^,¹⁰. APOBEC3 enzymes also induce DNA editing, producing overlapping mutational signatures that complicate discrimination between RNA editing and genomic variation⁸^,¹⁰^,¹¹^,¹².

Next‑generation sequencing has enabled transcriptome‑wide identification of potential RNA editing sites, yet distinguishing true edits from genomic SNVs or technical noise remains difficult. A>I and C>U events appear as A>G and C>T substitutions in cDNA libraries and can be confounded by mispriming, polymerase errors, mapping artifacts, and context‑specific expression changes. Public resources such as REDIportal¹³, catalog millions of A>I sites, whereas C>U annotations remain sparse, reflecting both biological and analytical constraints. Reliable identification of C>U editing—especially changes across conditions—therefore remains an unmet analytical need.

The overarching goal of the method presented here, the Calibrated Differential RNA Editing Scanner (CADRES), is the precise identification of Differential Variants on RNA (DVRs): editing sites that undergo statistically significant changes in editing depth between two or more defined conditions¹⁴. In developing this protocol, we sought to address two persistent obstacles. First, bona fide RNA edits must be distinguished from DNA‑encoded variants. Second, editing differences must be quantified in a statistically robust manner across biologically replicated RNA‑seq datasets. The central innovation of CADRES lies in its integration of joint DNA/RNA variant calling with a calibrated treatment of RNA variants during base‑quality score recalibration (BQSR). This “boost recalibration” strategy preserves newly discovered RNA editing sites during BQSR, thereby preventing systematic quality downgrading that commonly erodes sensitivity for low‑frequency edits¹⁵^,¹⁶^,¹⁷. This approach reduces false negatives and improves specificity compared with pipelines that rely solely on incomplete RNA editing databases.

CADRES sits within a landscape of methods that each address different aspects of RNA editing analysis. SNPiR¹⁸ and RVboost¹⁹ filter artifacts from RNA‑only variant sets; VaDiR²⁰ incorporates DNA–RNA comparisons but does not model replicate structure; rMATS‑DVR²¹ performs GLMM‑based differential testing but relies exclusively on RNA‑seq; and JACUSA/JACUSA2²²^,²³ support replicate‑aware detection but do not incorporate joint DNA–RNA interrogation or recalibration strategies. CADRES unifies replicate‑aware statistical modeling, joint DNA/RNA variant calling, and recalibration enriched for de novo editing sites, providing a single workflow optimized for detecting condition‑dependent RNA editing—including C>U events linked to APOBEC activity¹⁰^,¹¹^,¹².

In this context, users may consider CADRES appropriate when their experimental system fulfills the following criteria. First, paired RNA‑seq and whole‑genome or whole‑exome sequencing from the same samples is available, enabling rigorous partitioning of RNA‑derived events from DNA‑encoded variants. Second, the biological question concerns changes in RNA editing across conditions—such as enzyme induction, environmental stress, developmental stages, or disease states—where statistical modeling of allele‑specific depth across replicates is essential. Third, the investigator seeks enhanced specificity in C>U editing detection, where distinguishing RNA events from APOBEC‑driven DNA mutagenesis is indispensable. CADRES is particularly valuable in systems where APOBEC activity induces both RNA and DNA edits, as demonstrated in inducible A3B models¹⁰^,¹¹^,¹² and where conventional RNA‑only methods exhibit inflated false‑positive rates due to confounding SNVs or repetitive‑sequence artifacts.

CADRES offers several practical advantages. Its joint DNA/RNA variant calling reduces SNV‑driven false positives. Boost recalibration preserves true editing signals, including novel events absent from reference databases. The rMATS‑derived GLMM provides a statistically principled framework for differential editing analysis across replicates. Together, these features yield a calibrated, high‑precision platform for studying dynamic RNA editing across experimental and disease settings. In our previous study¹⁴, CADRES was rigorously benchmarked against established RNA editing detection methods using both in silico simulated datasets and real-world inducible A3B cell models. In the in silico evaluation, CADRES consistently achieved precision scores of 0.85–0.95 and accuracy scores of 0.92–0.98 across replicate numbers. The overall CADRES workflow is illustrated in Figure 1.

Protocol

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This protocol describes a purely computational bioinformatics workflow for identifying C>U RNA editing events using the CADRES framework. All steps are performed within a Linux environment using the command line. Only publicly available sequencing datasets are used, and no human or vertebrate subjects are involved.

1. Environment setup and software installation

NOTE: Minimum computational requirements for the CADRES workflow are as follows: CPU ≥ 8 cores (16 cores recommended), RAM ≥ 32 GB (64 GB recommended for whole-genome datasets), and disk space ≥ 100 GB.

Confirm that a Linux operating system is available. Open a terminal window and ensure permission is available to install in the current user environment.
Install a Conda package manager if it is not already present on the system. Download an installer for a minimal Conda distribution from its official website. Execute the installation script following the on‑screen instructions.
Verify that Conda is active by entering the following command and ensure that the command prints a valid version number.
$ conda --version
Create a working directory for the CADRES workflow. Navigate into this directory using:
$ cd /path/to/working_directory
Download the CADRES source code by executing:
$ git clone --branch v1.0.0 https://github.com/junsun-hash/CADRES
Enter the cloned directory by running:
$ cd CADRES
Create a dedicated Conda environment using the environment.yml file provided within the CADRES repository. Execute the following command and allow the installation process to complete without interruption.
$ conda env create -f environment.yml
Activate the newly created environment by entering the following command. Confirm that the environment has been activated by checking that the terminal prompt now displays its name.
$ conda activate CADRES
Verify that the required command‑line tools have been installed correctly. Execute each command below and confirm that they return a version number rather than an error message:
$ python --version
$ samtools --version
$ gatk --help
$ bedtools --version
$ pblat
NOTE: The exact set of tools included in the CADRES environment may differ slightly depending on updates to the environment.yml file. If a tool is missing, recreate the environment or update the dependency list as required.
Ensure that sufficient disk space is available. Confirm that at least 100 GB of free space exists for reference genomes, alignment indices, and intermediate BAM files by entering:
$ df -h
Confirm that write permissions are available in all working, output, and temporary directories by creating a test file:
$ touch test_file.txt
Delete the file afterwards by entering:
$ rm test_file.txt

2. Data preparation

NOTE: The representative dataset used in this protocol consists of: HEK293T cells with doxycycline-inducible A3B–GFP; WGS at 33×; strand-specific paired-end RNA-seq (2×150 bp, ≥60 M reads/sample); n = 3 biological replicates per condition (DMSO vs. doxycycline 72 h). Full data: SRA PRJNA1211186. A chr22 demonstration subset is provided in the CADRES repository.

CADRES requires: (i) WGS (≥33×) or WES (≥33×); (ii) strand-specific, paired-end RNA-seq (≥60 million reads per sample); (iii) two experimental conditions with ≥2 biological replicates each.

Prepare the reference genome and annotation.
1. Download the reference genome (FASTA) and GTF annotation file from Ensembl or a comparable repository. The recommended reference genome is Ensembl GRCh38 primary assembly: https://ftp.ensembl.org/pub/release-111/fasta/homo_sapiens/dna/ and the recommended GTF annotation is GENCODE release 45:
  https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/.
2. Index the reference FASTA:
  $ samtools faidx FASTA_FILE.fa
  Ensure that chromosome naming conventions (for example, “chr1” versus “1”) are consistent across all reference materials.
Obtain the sequencing data.
1. Obtain DNA‑seq FASTQ files (WGS or WES) with ≥33× depth.
2. Obtain strand‑specific, paired‑end RNA‑seq FASTQ files with ≥60 million reads per sample, across two biological conditions and at least two biological replicates per group.
Align the DNA sequencing reads using BWA‑MEM.
1. Build a BWA index:
  $ bwa index Homo_sapiens.GRCh38.dna.primary_assembly.fa -p bwaindex -a bwtsw
2. Align and convert to BAM:
  $ bwa mem -R '@RG\tID:ID\tPL:platform\tLB:library\tSM:sample_name' bwaindex wgs_R1.fq.gz wgs_R2.fq.gz | samtools view -b > wgs.bam
Align the RNA sequencing reads using STAR.
1. Generate STAR genome index:
  $ STAR --runMode genomeGenerate \
  --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa \
  --genomeDir STAR_index \
  --sjdbGTFfile Homo_sapiens.GRCh38.gtf
2. Align RNA reads
  $ STAR \
  --genomeDir STAR_index \
  --readFilesCommand zcat \
  --readFilesIn rna_sample1.fq.gz rna_sample2.fq.gz \
  --sjdbGTFfile Homo_sapiens.GRCh38.gtf \
  --outSAMtype BAM Unsorted \
  --outSAMmapqUnique 60 \
  --outFileNamePrefix pass1_
  NOTE: This produces a splice-junction file (pass1_SJ.out.tab) containing both annotated and novel junctions.
3. Re-generate the STAR genome index incorporating novel junctions:
  $ cat pass1_SJ.out.tab > SJ_all.tab
  $ STAR --runMode genomeGenerate \
  --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa \
  --genomeDir STAR_index_2pass \
  --sjdbGTFfile Homo_sapiens.GRCh38.gtf \
  --sjdbFileChrStartEnd SJ_all.tab
4. Perform the second-pass alignment using the updated index:
  $ STAR \
  --genomeDir STAR_index_2pass \
  --readFilesCommand zcat \
  --readFilesIn rna_sample1.fq.gz rna_sample2.fq.gz \
  --sjdbGTFfile Homo_sapiens.GRCh38.gtf \
  --outSAMtype BAM SortedByCoordinate \
  --outSAMmapqUnique 60 \
  --outFileNamePrefix output_name
Prepare auxiliary reference resources.
1. (Recommended) Obtain dbSNP VCF
  Download the human GRCh38 dbSNP VCF (for example, dbSNP build 150) from the NCBI FTP server:
  https://ftp.ncbi.nih.gov/snp/latest_release/VCF/ Place the downloaded file (for example, dbsnp_150.vcf.gz) into the working directory.
  NOTE: RNA-derived entries (molType="cDNA") in dbSNP may mask true RNA editing sites. Exclude them using:
  $ bcftools view -i 'INFO/molType!="cDNA"' dbsnp.vcf.gz -Oz -o dbsnp_no_cDNA.vcf.gz
2. (Recommended) Sort the dbSNP VCF
  Sort the VCF so it is compatible with the reference genome and GATK:
  $ gatk SortVcf \
  -I dbsnp_150.vcf.gz \
  -O dbsnp_150.sorted.vcf.gz \
  --sequence-dictionary Homo_sapiens.GRCh38.dict
3. (Recommended) Index the sorted dbSNP VCF
  Create an index for the sorted dbSNP VCF:
  $ gatk IndexFeatureFile -I dbsnp_150.sorted.vcf.gz
  NOTE: A matching reference dictionary is required. If the file Homo_sapiens.GRCh38.dict is missing, generate it as follows:
  $ gatk CreateSequenceDictionary \
  -R Homo_sapiens.GRCh38.dna.primary_assembly.fa \
  -O Homo_sapiens.GRCh38.dna.primary_assembly.dict
4. (Recommended) Obtain gnomAD germline VCF
  Download the GRCh38 gnomAD germline variant sites VCF from: https://gnomad.broadinstitute.org/downloads Use the genome VCF appropriate for the pipeline (for example, gnomad.genomes.vX.X.sites.vcf.gz).
5. (Recommended) Sort the gnomAD VCF
  Sort the gnomAD VCF using the same reference dictionary to ensure compatibility:
  $ gatk SortVcf \
  -I gnomad.vcf.gz \
  -O gnomad.sorted.vcf.gz \
  --sequence-dictionary Homo_sapiens.GRCh38.dict
6. (Recommended) Index the sorted gnomAD VCF
  Create an index for the sorted gnomAD VCF:
  $ gatk IndexFeatureFile -I gnomad.sorted.vcf.gz
  NOTE: Ensure that chromosome naming (for example, “chr1” vs “1”) matches the reference FASTA before running SortVcf.
Obtain a known RNA editing reference.
NOTE: A compatible REDIportal reference file (rediportal.txt) suitable for CADRES is curated within the CADRES repository and may be downloaded directly from:
https://github.com/junsun-hash/CADRES/releases/tag/v1.0.0. This file is used for annotation of known editing events in Step 3.
Prepare gene annotations in RefGene format. Download the RefGene annotation (for example, refGene.txt.gz from UCSC). Decompress if necessary and ensure that chromosome names correspond to those in the reference genome.
NOTE: An example suitable for CADRES is curated within the CADRES repository and may be downloaded directly from:
https://github.com/junsun-hash/CADRES/releases/tag/v1.0.0

3. Execution of the CADRES Analytical Workflow

NOTE: Section 3 is executed in the Linux terminal with the CADRES Conda environment activated.

Calibration and Boost Recalibration
NOTE: Step 1 standardizes BAM files and performs Boost recalibration—an enhanced BQSR that incorporates dbSNP, gnomAD, and a preliminary set of RNA editing candidates to preserve genuine editing signals. List all RNA BAM files separated by spaces. Output: recalibrated BAMs (suffix: _recalibration.bam) and Boost candidate sites.
1. To execute Step 3.1:
  $ python pipeline_step1_calibration.py \
  --rna_bams /path/to/rna_sample1.bam /path/to/rna_sample2.bam ... \
  --dna_bam /path/to/wgs_normal.bam \
  --genome /path/to/hg38.fa \
  --known_snv /path/to/dbsnp.sorted.vcf.gz \
  --output_dir ./output/step1_calibration \
  --prefix project_demo
  NOTE: Boost candidates are built from a preliminary joint DNA-RNA Mutect2 call (--max-events-in-region 4, PASS filter only; no additional AF/quality thresholds). Homopolymer and repeat filtering are handled in Step 3.2.
Variant calling, contamination estimation, and filtering
NOTE: Step 3.2 performs joint DNA-RNA variant calling with contamination estimation (via gnomAD), then filters candidates by homopolymer context and PBLAT re-alignment. Output: {prefix}.final.vcf.
1. To execute Step 3.2:
  $ python pipeline_step2_variant_calling.py \
  --rna_bams ./output/step1_calibration/rna_sample1_split_recalibration.bam ... \
  --dna_bam ./output/step1_calibration/DNA_processed_recalibration.bam \
  --genome /path/to/hg38.fa \
  --gnomad /path/to/gnomad.sorted.vcf.gz \
  --output_dir ./output/step2_variant_calling \
  --prefix project_demo
  NOTE: This step produces the final, stringently filtered variant call set (project_demo.final.vcf), representing high‑confidence RNA-DNA differences across all samples. Key parameters: Mutect2 --min-median-base-quality 12, --max-events-in-region 4; PBLAT minbasequal 5; all are pre-configured in the pipeline script.
Statistical testing and functional annotation
NOTE: Step 3.3 quantifies differential RNA editing using a GLMM adapted from rMATS (Benjamini-Hochberg FDR correction), and annotates each site with gene region, gene symbol, and known editing status. Output: {prefix}_Result.txt (DVRs with P-values and FDR).
The PBLAT re-alignment filter removes candidates mapping to multiple genomic loci, enhancing specificity in repetitive regions. Its internal THREAD_COUNT is controlled by the --threads flag in pipeline_step2_variant_calling.py.
1. To execute Step 3.3:
  $ python pipeline_step3_statistical_test.py \
  --group1_rna_bams ./output/step1_calibration/control_rep1.bam ... \
  --group2_rna_bams ./output/step1_calibration/treated_rep1.bam ... \
  --final_vcf ./output/step2_variant_calling/project_demo.final.vcf \
  --genome /path/to/hg38.fa \
  --known_snv /path/to/dbsnp.sorted.vcf.gz \
  --known_editing /path/to/rediportal.txt \
  --gene_anno /path/to/refGene.txt \
  --output_dir ./output/step3_statistical_test \
  --labels Control Treated
  NOTE: Key parameters: samtools mpileup -q 30 (min mapping quality), -Q 17 (min base quality); rMATS-GLMM likelihood-ratio test with Δψ cutoff = 0.0001, binomial logit link with replicate-aware multivariate normal penalty (rho = 0.9); Benjamini-Hochberg FDR correction; all are pre-configured in the pipeline script
  All three pipeline steps support multi-threaded execution via the --threads flag (default: 4 per step). Step 2 additionally accepts --contamination_threads (default: 2).

4. Results inspection and visualization

After the CADRES workflow completes, navigate to the output directory. The main results file, {prefix}_Result.txt, lists all detected Differential Variants on RNA (DVRs), including genomic coordinates, alleles, replicate‑level allele counts, editing fractions, between‑group differences, and associated statistical metrics (P value and FDR). Gene‑level annotations (gene symbol, region, strand, variant type, known SNP/editing status) are also included. The accompanying summary file, {prefix}_Result_summary.txt, provides counts of each substitution type and their classification into SNP DVRs, known RNA‑editing DVRs, and novel DVRs.
(Optional) Generate standard visualizations by running the post analysis script. Open an R session and enter:
R console:
source("Post-analysis.R")
The script Post-analysis.R is included in https://github.com/junsun-hash/CADRES/.
A file-selection dialog will appear; choose {prefix}_Result.txt. The script produces six PNG figures.
NOTE: The file-selection dialog requires a desktop R session. On headless servers, edit the input_file variable directly (line 10 of Post-analysis.R) and run Rscript Post-analysis.R.

Results

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

To evaluate CADRES under realistic experimental conditions, we used an inducible APOBEC3B (A3B) system in 293T cells. A doxycycline‑responsive lentiviral construct expressing A3B-GFP was introduced into 293T cells, and stable integrants were selected with puromycin. Induction with doxycycline for 72 h produced robust A3B-GFP expression, confirmed by GFP fluorescence and increased A3B mRNA levels. Matched induced and non‑induced samples were then subjected to uniform DNA and RNA extraction, library preparation, and sequencing to ensure that observed RNA-DNA differences reflected true A3B‑dependent editing rather than technical noise. The complete raw dataset is available under SRA accession PRJNA1211186, with a chromosome‑22 segment (chr22:28,000–30,000 kb) subset provided in the CADRES GitHub repository (https://github.com/junsun-hash/CADRES/releases/tag/v1.0.0) for demonstration and testing purposes.

After execution, CADRES generates {prefix}_Result.txt with DVR coordinates, editing fractions, P-values, and FDR. Expected outputs show: (i) DVR counts consistent with biological editing activity; (ii) progressive reduction in the filtering funnel (Figure 2); (iii) consistent replicate-level editing fractions. Low DVR counts or flat funnels warrant review of alignment statistics and input data quality. The Post-analysis.R script produces six standard PNG outputs: Plot_Volcano_CU_AI.png (volcano plot of significant editing changes), Plot_Correlation_DMSO_DOX.png (inter-condition editing-fraction correlation), Plot_LocationDistribution_AllSig.png (genomic distribution of DVRs), Plot_Location_Comparison_CU_AI.png (positional comparison by substitution class), Plot_EditingShift_CU_AI.png (magnitude and direction of editing shifts), and Plot_TopGenes_CU_AI.png (genes with the highest DVR counts).

Figure 2 shows how CADRES progressively refines candidate RNA-DNA differences. Nearly one million initial Mutect2 calls were reduced by PASS filtering, restriction to single-nucleotide variants, homopolymer filtering, and PBLAT realignment. Each step removed substantial artifactual signal, particularly in repetitive or ambiguous genomic regions. Ultimately, 348 sites passed the rMATS‑DVR significance threshold (FDR < 0.05), representing a final set of high‑confidence differential editing events.

As shown in Figure 3, DVRs were dominated by deamination‑associated substitution classes. A>G and C>U variants represented the vast majority of calls, consistent with A>I and C>U editing. Of these, 110 A>G and 233 C>U sites were significantly altered between conditions, with C>U events comprising the largest DVR class. Most A>G DVRs were novel rather than annotated in SNP or RNA‑editing databases, and C>U DVRs showed a similarly high proportion of novel sites (207 of 233). Other substitution types yielded only isolated DVRs, consistent with their lower biological relevance in A3B‑driven systems.

Quality-control assessments confirmed robust editing quantification. Figure 4 (generated by the CADRES pipeline, Plot_Correlation_DMSO_DOX.png) shows class-specific comparisons of alternative-allele fractions between DMSO and doxycycline samples for A>I, C>U, and other substitution classes. This reflects expected A3B-induced editing: sites near the diagonal are constitutively edited, while off-diagonal sites represent condition-specific changes. The deviation from unity is consistent with extensive A3B activity rather than technical noise. In Figure 5 (Plot_Volcano_CU_AI.png), A>I sites exhibited larger positive editing shifts, whereas C>U sites formed a dense cluster of significant events with smaller net changes. Figure 6 (Plot_LocationDistribution_AllSig.png) shows that significant sites are localized primarily to 3′ UTRs, followed by introns and coding regions. Figure 7 (Plot_Location_Comparison_CU_AI.png) reveals clear positional divergence between mutation classes: C>U sites were enriched in 3′ UTRs and coding exons, whereas A>I sites were concentrated in introns. Figure 8 (Plot_TopGenes_CU_AI.png) lists genes with multiple DVRs, with NEAT1 and MGEA5 showing the highest counts; several others, including XBP1, GSR, and LOC101928978, also contained recurrent events. Many of these genes displayed increased edited-allele fractions following A3B induction.

Together, these results demonstrate that CADRES effectively distinguishes genuine A3B‑dependent RNA editing from genomic and technical sources of variation, recovers mutation‑class‑specific editing patterns, and produces a compact, high‑confidence set of condition‑dependent RNA editing events. This protocol uses an APOBEC3B overexpression system at supraphysiological editing levels (n = 3 biological replicates per condition). Performance may differ in endogenous editing contexts or across heterogeneous genomic backgrounds. Users applying CADRES to clinical samples should interpret DVR counts relative to their system and consider orthogonal validation for key sites.

Workflow diagram for genomic data analysis: setup, RNA/DNA data processing, execution, results.
Figure 1. Workflow and schematic of the CADRES pipeline. This figure illustrates the overall CADRES workflow, including software setup, input requirements, preprocessing steps, variant calling, filtering, and downstream differential RNA‑editing analysis. The pipeline begins with the installation of dependencies and creation of the CADRES environment, followed by alignment of RNA‑seq and DNA‑sequencing data to a reference genome to generate BAM files. External annotation resources such as REDIportal, RefGene, gnomAD, and dbSNP are incorporated. CADRES then performs calibration, variant calling, and multiple filtering steps to refine candidate RNA-DNA variants. Strand separation and allele‑depth acquisition are used to prepare inputs for rMATS‑based differential variant analysis, producing the summary and result files ({prefix}_Result_summary.txt and {prefix}_Result.txt). Please click here to view a larger version of this figure.

Variant filtering bar chart; significant variants, false discovery rate analysis.
Figure 2. Progressive refinement of RNA–DNA variants by CADRES. DVRs were identified from DOX‑treated versus DMSO‑treated 293T A3B‑GFP lentiviral inducible cells. This figure shows the sequential filtering steps used by CADRES to refine raw variant calls. Statistical threshold: FDR < 0.05 (Benjamini‑Hochberg). Biological replicates: n = 3 per condition. Please click here to view a larger version of this figure.

Editing and mutation variant summary bar chart; substitution type vs. DVRs (FDR<0.05).
Figure 3. Substitution-class distribution of detected variants and DVRs. Corresponds to CADRES output: {prefix}_Result_summary.txt. DVRs were obtained from DOX‑treated versus DMSO‑treated 293T A3B‑GFP lentiviral inducible cells. This figure displays the counts of each substitution type for all variants and for DVRs. DVRs defined at FDR < 0.05. Biological replicates: n = 3 per condition. Please click here to view a larger version of this figure.

$DOX vs DMSO editing correlation; scatter plots showing allele fraction and correlation coefficients.$
Figure 4. Correlation of editing levels between DOX‑ and DMSO‑treated 293T A3B‑GFP cells. Corresponds to CADRES output: Plot_Correlation_DMSO_DOX.png. This figure shows class-specific comparisons of editing levels between the two treatment conditions for A>I, C>U, and other substitution classes. Pearson correlation coefficients are shown in the individual panels. Biological replicates: n = 3 per condition. Please click here to view a larger version of this figure.

Volcano plot of RNA editing differences, C>U vs A>I, Δ Alt Allele Fraction vs -log10(FDR).
Figure 5. Differential editing landscape between conditions. Corresponds to CADRES output: Plot_Volcano_CU_AI.png. This figure shows a volcano plot of effect size and statistical significance for all sites tested for differential RNA editing between DOX‑ and DMSO‑treated cells. Significance threshold: FDR < 0.05 (Benjamini‑Hochberg). Biological replicates: n = 3 per condition. Please click here to view a larger version of this figure.

Editing site distribution bar chart by genomic location; categories include 3UTR, Intron, CDS.
Figure 6. Genomic annotation of significant RNA editing sites. Corresponds to CADRES output: Plot_LocationDistribution_AllSig.png. This figure shows the distribution of DVRs across annotated genomic features. DVRs at FDR < 0.05. Biological replicates: n = 3 per condition. Please click here to view a larger version of this figure.

Distribution of editing site locations bar chart; genomic locations vs. substitution types A>I, C>U.
Figure 7. Positional distribution of C>U and A>I DVRs. Corresponds to CADRES output: Plot_Location_Comparison_CU_AI.png. This figure displays the genomic location categories of DVRs separated by substitution type. DVRs at FDR < 0.05. Biological replicates: n = 3 per condition. Please click here to view a larger version of this figure.

$Bar chart of genes with editing sites, showing mean delta fraction for genetic analysis.$
Figure 8. Top genes containing multiple DVRs in DOX‑ versus DMSO‑treated samples. Corresponds to CADRES output: Plot_TopGenes_CU_AI.png. This figure displays the top 20 genes ranked by the number of DVRs. Color intensity represents mean delta fraction (Δ = DOX minus DMSO alternative‑allele frequency). Positive delta fraction values indicate higher editing levels in DOX‑treated samples compared to DMSO controls, consistent with A3B induction. All top DVR-containing genes (NEAT1, MGEA5, GSR, etc.) showed elevated editing upon A3B overexpression. DVRs at FDR < 0.05. Biological replicates: n = 3 per condition. Please click here to view a larger version of this figure.

Discussion

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The CADRES workflow presented here provides a calibrated, internally consistent strategy for detecting differential RNA editing events with high specificity, particularly C>U deamination catalyzed by APOBEC enzymes. Several steps within the protocol are pivotal to its accuracy. Matched genomic and transcriptomic sequencing is essential for distinguishing genuine RNA edits from underlying DNA polymorphisms, while the boost recalibration procedure safeguards authentic RNA variants from being erroneously penalized during base‑quality score recalibration. Equally important are the sequence‑composition and mapping filters, which reduce false‑positive calls arising from homopolymer runs or multi‑mapping reads, and the use of strand‑resolved allele‑depth measurements. Reliable identification of differential editing further depends on the proper definition of replicate structure within the rMATS generalized linear mixed model.

Although CADRES is designed as an end‑to‑end pipeline, several components can be adapted to accommodate practical constraints. High‑depth exome sequencing may substitute for whole‑genome sequencing in circumstances where comprehensive genomic data cannot be generated, with the caveat that shallow or subclonal mutations may remain insufficiently resolved. Users working with highly repetitive transcripts or with biological systems that display extensive clustered editing may require modest adjustment of variant‑calling parameters or more stringent alignment filters. Troubleshooting commonly involves verifying strand annotation, ensuring correct construction of the known‑sites list for recalibration, and confirming that sequencing depth is sufficient across all replicates.

The statistical testing module in CADRES adapts the likelihood-ratio test framework from rMATS for differential RNA editing detection. For each candidate site, a null model (equal editing fraction across conditions) is compared against an alternative (difference ≥ Δψ = 0.0001), using a binomial likelihood with a logit link. Inter-replicate correlation is captured via a multivariate normal penalty (rho = 0.9) rather than site-level random effects, which also helps control overdispersion. The combined negative log-likelihood is minimized via L-BFGS-B, and raw P-values are corrected using Benjamini–Hochberg FDR.

Despite these strengths, CADRES has several limitations. CADRES depends on the availability of paired DNA-RNA data and therefore cannot be applied to many legacy RNA‑seq datasets. As with most short‑read-based approaches, very low‑abundance transcripts and structurally complex RNA regions remain challenging to interrogate with high confidence. In samples with substantial genomic heterogeneity, low‑frequency somatic variants may persist after filtering and could represent residual confounders in downstream analyses. In addition, the incompleteness of current RNA editing reference databases, particularly for C>U editing, constrains the precision of recalibration in poorly annotated genomic regions.

A distinguishing advantage of CADRES lies in its deliberate prioritization of precision over sensitivity. Although the stringent filtering strategy may omit a proportion of low-confidence editing events, the resulting call set is characterized by reduced contamination from genomic variants and sequencing artifacts. This makes CADRES useful for downstream mechanistic interpretation and for prioritizing candidate editing sites for orthogonal validation. CADRES default parameters prioritize high specificity. For exploratory analyses, sensitivity can be increased by modifying pipeline_step2_variant_calling.py to disable the homopolymer filter or bypass PBLAT, lowering Mutect2 --min-median-base-quality from 12 to 8, or relaxing the FDR threshold (e.g., FDR < 0.10). These adjustments increase DVR yield at the cost of reduced precision; orthogonal validation is recommended.

Nevertheless, CADRES offers notable advantages over generalized variant‑calling or editing‑detection procedures. By integrating DNA-aware and RNA-replicate-aware analyses within a single calibrated framework, the pipeline achieves greater specificity, reducing the misclassification of SNVs as RNA edits and improving the fidelity of differential editing calls. The method is particularly well suited to experimental systems in which RNA editing varies across defined conditions—such as inducible deaminase models, immune stimulation, environmental stress, or developmental transitions—and provides a reliable means of attributing transcriptomic changes to RNA‑level modifications rather than underlying genomic variation. Beyond its use as an analytical tool, CADRES has potential applications in defining precise RNA editing signatures and in clarifying the transcriptomic contributions of dual DNA-RNA deaminases.

In summary, CADRES affords a robust, replicable, and biologically grounded approach for quantifying differential RNA editing. When applied with appropriate experimental design—particularly matched sequencing, sufficient replication, and rigorous preprocessing—the pipeline yields high‑confidence editing profiles that can support mechanistic and computational studies across diverse biological contexts.

Disclosures

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

J.S., Z.D. and C.Z. are employees of Shanghai Institute of Biological Products, an entity presently engaged in the commercial development of therapeutic biologics.

Acknowledgements

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study was funded by the Science and Technology Commission of Shanghai (23S11901100).

Materials

List of materials used in this article
Name	Company	Catalog Number	Comments
BCFtools	Samtools project	N/A	Version 1.21. Variant calling and VCF manipulation. URL: https://github.com/samtools/bcftools
Bedtools	Quinlan Lab	N/A	Version 2.31.1. Genome arithmetic operations. URL: https://github.com/arq5x/bedtools2
Biopython	Biopython project	N/A	Version 1.85. Python tools for molecular biology. URL: https://www.biopython.org
BWA-MEM	GitHub (lh3/bwa)	N/A	Version 0.7.18. DNA-seq alignment. URL: https://github.com/lh3/bwa
CADRES source code	GitHub (junsun-hash/CADRES)	N/A	Version 1.0.0. CADRES pipeline scripts. URL: https://github.com/junsun-hash/CADRES
Conda or Miniconda	Anaconda Inc.	N/A	Version 23.1. Package and environment manager. URL: https://docs.conda.io/en/latest/miniconda.html
dbSNP GRCh38 VCF	NCBI	N/A	Build 155. Common germline variants database. URL: https://ftp.ncbi.nih.gov/snp/
GATK4	Broad Institute	N/A	Version 4.3.0.0. Genome Analysis Toolkit. URL: https://github.com/broadinstitute/gatk
Git	Software Freedom Conservancy	N/A	Version 2.39. Version control system. URL: https://git-scm.com
gnomAD GRCh38 VCF	Broad Institute	N/A	Version 3.1. Population allele frequencies. URL: https://gnomad.broadinstitute.org
GTF annotation file	Ensembl	N/A	Release 109. Gene annotation for GRCh38. URL: https://www.ensembl.org
Human reference genome GRCh38	Ensembl/UCSC	N/A	Release 109. Reference genome assembly. URL: https://www.ensembl.org or https://hgdownload.soe.ucsc.edu
Linux workstation or server	Various	N/A	Ubuntu 20.04. x86_64 architecture required. URL: https://ubuntu.com
pblat	UCSC Genome Browser	N/A	Version 2.5.1. Parallel BLAT realignment. URL: https://github.com/ucscGenomeBrowser/kent
Picard	Broad Institute	N/A	Version 2.20.8. NGS data manipulation. URL: https://github.com/broadinstitute/picard
Python	Python Software Foundation	N/A	Version 3.9.19. Programming language. URL: https://www.python.org
R	R Foundation	N/A	Version 4.5.2. Statistical computing. URL: https://www.r-project.org
R package: forcats	CRAN	N/A	Version 1.0.0. Factor manipulation. URL: https://cran.r-project.org/package=forcats
R package: ggplot2	CRAN	N/A	Version 4.0.1. Data visualization. URL: https://cran.r-project.org/package=ggplot2
R package: ggrepel	CRAN	N/A	Version 0.9.5. Text label repulsion. URL: https://cran.r-project.org/package=ggrepel
R package: lme4	CRAN	N/A	Version 1.1.35. Linear mixed-effects models. URL: https://cran.r-project.org/package=lme4
R package: readr	CRAN	N/A	Version 2.1.5. Fast file reading. URL: https://cran.r-project.org/package=readr
R package: stringr	CRAN	N/A	Version 1.6.0. String manipulation. URL: https://cran.r-project.org/package=stringr
REDIportal reference	University of Bologna	N/A	Version 2.0. A-to-I RNA editing sites database. URL: http://srv00.recas.ba.infn.it/atlas/
RefGene annotation	UCSC Table Browser	N/A	Release 109. Gene structure annotation. URL: https://genome.ucsc.edu/cgi-bin/hgTables
Samtools	Samtools project	N/A	Version 1.21. BAM file manipulation. URL: https://github.com/samtools/samtools
STAR aligner	GitHub (alexdobin/STAR)	N/A	Version 2.7.11b. RNA-seq alignment. URL: https://github.com/alexdobin/STAR

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

High-precision Detection of RNA Editing Sites using Calibrated Differential RNA Editing Scanner

In This Article

Summary

Abstract

Introduction

Protocol

Results

Discussion

Disclosures

Acknowledgements

Materials

Reprints and Permissions

Tags

Related Articles