This protocol describes a rapid and broadly applicable method for unbiased RNA-sequencing of viral samples from human clinical isolates.
Here we outline a next-generation RNA sequencing protocol that enables de novo assemblies and intra-host variant calls of viral genomes collected from clinical and biological sources. The method is unbiased and universal; it uses random primers for cDNA synthesis and requires no prior knowledge of the viral sequence content. Before library construction, selective RNase H-based digestion is used to deplete unwanted RNA — including poly(rA) carrier and ribosomal RNA — from the viral RNA sample. Selective depletion improves both the data quality and the number of unique reads in viral RNA sequencing libraries. Moreover, a transposase-based 'tagmentation' step is used in the protocol as it reduces overall library construction time. The protocol has enabled rapid deep sequencing of over 600 Lassa and Ebola virus samples-including collections from both blood and tissue isolates-and is broadly applicable to other microbial genomics studies.
Next generation sequencing of viruses from clinical sources can inform transmission and the epidemiology of infections, as well as help support novel diagnostic, vaccine and therapeutic development. cDNA synthesis using random primers has allowed the detection and assembly of genomes from divergent, co-infecting or even novel viruses1,2. As with other unbiased methods, unwanted contaminants occupy many sequencing reads and negatively impact sequencing results. Host and poly(rA) carrier RNA are contaminants present in many existing viral sample collections.
The protocol describes an efficient and cost-effective way of deep sequencing RNA virus genomes based on unbiased total RNA-seq. The method utilizes an RNase H selective depletion step3 to remove unwanted host ribosomal and carrier RNA. Selective depletion enriches for viral content (Figure 1) and improves the overall quality of sequencing data (Figure 2) from clinical samples. Moreover, tagmentation is applied to the protocol as it significantly reduces library construction time. These methods have been used to rapidly generate large datasets of Ebola and Lassa virus genomes2,4,5 and can be used to study a wide range of RNA viruses. Lastly, the approach is not limited to human samples; the utility of selective depletion was demonstrated on tissue samples collected from Lassa-infected rodents and non-human primate disease models5,6.
Figure 1. Total RNA Content Reflects Enrichment of Lassa Virus Content Using Selective Depletion. Starting overall content (RNA input) and enrichment of unique Lassa virus (LASV) reads (Library content) upon rRNA depletion from nine different clinical isolates. This figure has been modified from6. Please click here to view a larger version of this figure.
Figure 2. Higher Quality Sequencing After Carrier RNA Depletion. Median base qualities per sequencing cycle of poly(rA)-contaminated Lassa virus libraries (red) and control (no carrier observed in library, black) from QC report 13. Both read 1 and read 2 of paired end reads are merged in the library BAM file and the quality scores are shown at each base. This figure has been modified from6. Please click here to view a larger version of this figure.
The viral RNA-seq protocol details construction of libraries directly from extracted RNA collected from clinical and biological samples. To ensure personal safety, all viral serum, plasma and tissue samples should be inactivated in appropriate buffers prior to RNA extraction. In some inactivation and extraction kits, carrier poly(rA) RNA is included; this will be removed during the initial RNase H selective depletion step. Based on complete recovery, the expected concentration of carrier RNA is 100 ng/µl. In the protocol, 110 ng/µl oligo dT RNA (1.1x carrier concentration) is used for depletion. If poly(rA) carrier is not present in the sample, then oligo(dT) should not be added prior to depletion.
The following protocol is designed for 24 reactions in PCR plate format (up to 250 µl volume). An earlier version of this protocol was reported in Matranga, et al.6.
Ethics statement: Lassa fever patients were recruited for this study using protocols approved by human subjects committees at Tulane University, Harvard University, Broad Institute, Irrua Specialist Teaching Hospital (ISTH), Kenema Government Hospital (KGH), Oyo State Ministry of Health, Ibadan, Nigeria and Sierra Leone Ministry of Health. All patients were treated with a similar standard of care and were offered the drug Ribavirin, whether or not they decided to participate in the study. For Lassa fever (LF) patients, treatment with Ribavirin followed the currently recommended guidelines and was generally offered as soon as LF was strongly suspected.
Due to the severe outbreak for Ebola Virus Disease (EVD), patients could not be consented through our standard protocols. Instead use of clinical excess samples from EVD patients was evaluated and approved by Institutional Review Boards in Sierra Leone and at Harvard University. The Office of the Sierra Leone Ethics and Scientific Review Committee, the Sierra Leone Ministry of Health and Sanitation, and the Harvard Committee on the Use of Human Subjects have granted a waiver of consent to sequence and make publically available viral sequences obtained from patient and contact samples collected during the Ebola outbreak in Sierra Leone. These bodies also granted use of clinical and epidemiological data for de-identified samples collected from all suspected EVD patients receiving care during the outbreak response. The Sierra Leone Ministry of Health and Sanitation also approved shipments of non-infectious, non-biological samples from Sierra Leone to the Broad Institute and Harvard University for genomic studies of outbreak samples.
1. DNase-treatment of Sample RNA (Up to 55 µl Extracted Total RNA, ~4 hr)
2. Selective Depletion of Ribosomal and Carrier RNA from Viral RNA Sample (~4 hr)
3. cDNA Synthesis (~6 hr)
4. Library Preparation — DNA Library Construction (~4 hr)
Figure 3. Libraries Constructed from Ebola Virus Clinical Samples. Gel image of 4 representative Ebola virus (EBOV) libraries. Regions of library and primer dimers are shown. Please click here to view a larger version of this figure.
The described protocol enables the generation of high quality sequencing reads from low-input viral RNA samples while enriching for unique viral content. As shown in Figure 1, the protocol enriched unique Lassa virus content at least five-fold in all samples (compared to non-depleted controls) with at least one million copies of 18S rRNA (~100 pg total RNA). Likewise, sequencing success also correlated with the amount of virus within a given sample. Using qRT-PCR as a surrogate for viral quantity, samples that contained ~1,000 or more viral genome copies most often created full assemblies (data not shown). Moreover, depletion of poly(rA) carrier reduces homopolymer sequences of A and T in libraries, resulting in cleaner preparations and ensuring better quality sequencing reads (Figure 2). Final libraries from low input viral clinical samples often have a broad fragment length from 150 to 1,000 bp (Figure 3).
After sequencing, to reduce sample misidentification and crosstalk between libraries within a pool12, only index reads with a base quality score of 25 (q25) and ensure zero mismatches are kept during the demultiplexing process. Viral genomes are assembled using a bioinformatics pipeline specific for divergent viruses2,4-6. These tools are available at https://github.com/broadinstitute/viral-ngs or through commercial cloud platforms4.
Step 1.1: DNase reaction | |
Reagent | Volume per reaction (µl) |
10x DNase buffer | 7 |
Nuclease-free water | 6 |
Extracted viral RNA | 55 |
DNase (2 U/µl) | 2 |
Total volume | 70 |
Step 2.1: 5x Hybridization buffer | |
Reagent | Volume for 1 ml (µl) |
5 M NaCl | 200 |
1 M Tris-HCl (pH 7.4) | 500 |
Nuclease-free water | 300 |
Total volume | 1,000 |
Step 2.1: 10x RNase H reaction buffer | |
Reagent | Volume for 1 ml (µl) |
5 M NaCl | 200 |
1 M Tris-HCl (pH 7.5) | 500 |
1 M MgCl2 | 200 |
Nuclease-free water | 500 |
Total volume | 1,000 |
Step 2.1: Water with linear acrylamide | |
Reagent | Volume for 1 ml buffer (µl) |
Nuclease-free water | 992 |
Linear acrylamide (5 mg/ml) | 8 |
Total volume | 1,000 |
Step 2.2: Hybridization reaction for selective depletion | |
Reagent | Volume per reaction (µl) |
5x Hybridization Buffer | 2 |
rRNA-depletion oligo mix (100 µM) | 1.22 |
Oligo(d)T (550 ng/µl) | 1 |
DNase-treated total RNA | up to 5 |
Spike-in RNA (This is optional) | 0.5 |
Water (with linear acrylamide) | bring up to 10 total |
Total volume | 10 |
Step 2.3: RNase H reaction for selective depletion | |
Reagent | Volume per reaction (µl) |
10x RNase H Reaction Buffer | 2 |
Water (with linear acrylamide) | 5 |
Thermostable RNase H (5 U/µl) | 3 |
Total volume | 10 |
Step 2.4: DNase reaction post selective depletion | |
Reagent | Volume per reaction (µl) |
10x DNase Buffer | 7.5 |
Water (with linear acrylamide) | 44.5 |
RNase inhibitor (20 U/µl) | 1 |
RNase-free DNase I (2.72 U/µl) | 2 |
Total volume (with RNase H reaction) | 75 |
Step 3.1: cDNA synthesis, random primer hybridization | |
Reagent | Volume per reaction (µl) |
rRNA/carrier-depleted RNA | 10 |
3 µg random primer | 1 |
Total volume | 11 |
Step 3.2: First strand cDNA synthesis reaction | |
Reagent | Volume (µl) |
5x First-Strand Reaction Buffer | 4 |
0.1 M DTT | 2 |
10 mM dNTP mix | 1 |
RNase inhibitor (20 U/µl) | 1 |
Reverse transcriptase (add last) | 1 |
Total volume (with RNA above) | 20 |
Step 3.3: Second strand cDNA synthesis reaction | |
Reagent | Volume (µl) |
RNase-free water | 43 |
10x Second-Strand Reaction Buffer | 8 |
10 mM dNTP mix | 3 |
E. coli DNA Ligase (10 U/µl) | 1 |
E. coli DNA Polymerase I (10 U/µl) | 4 |
E. coli RNase H (2 U/µl) | 1 |
Total volume (with 1st strand reaction) | 80 |
Step 4.2: Tagmentation reaction | |
Reagent | Volume (µl) |
Amplicon Tagment Mix (ATM) | 1 |
Tagment DNA Buffer (TD) | 5 |
Total volume (with cDNA) | 10 |
Step 4.3: Library PCR reaction | |
Reagent | Volume (µl) |
PCR Master Mix (NPM) | 7.5 |
Index 1 primer (i7) | 2.5 |
Index 2 primer (i5) | 2.5 |
Total volume (with tagmented cDNA) | 25 |
Step 4.3.2: Library PCR conditions | |
72 °C, 3 min | |
95 °C, 30 sec | |
up to 18 cycles-10 sec at 95 °C, 30 sec at 55 °C, 30 sec at 72 °C | |
72 °C, 5 min | |
10 °C, forever |
Table 1: Reaction set-up and buffers. Step-by-step tables with contents of all buffers and reaction mixes.
Oligo Name | Sequence (5' to 3') |
Ebola KGH FW | GTCGTTCCAACAATCGAGCG |
Ebola KGH RV | CGTCCCGTAGCTTTRGCCAT |
Ebola KULESH FW | TCTGACATGGATTACCACAAGATC |
Ebola KULESH RV | GGATGACTCTTTGCCGAACAATC |
Lassa SL FW | GTA AGC CCA GCD GYA AAB CC |
Lassa SL RV | AAG CCA CAG AAA RCT GGS AGC A |
18S rRNA FW | TCCTTTAACGAGGATCCATTGG |
18S rRNA RV | CGAGCTTTTTAACTGCAGCAACT |
Table 2: qRT-PCR Primers Sequences. Primers used for measuring host (18S rRNA) and viral (Ebola and Lassa) content. 'KGH' is Kenema Government Hospital in Sierra Leone, where the Ebola primers were tested 2. 'Kulesh' is the investigator who designed the primer set 14.
Table 3: Ribosomal RNA (rRNA) Depletion Oligos. 195 50-nucleotide long sequences complementary to human rRNA for selective depletion step6. Please click here to download this file.
The outlined approach enables robust, universal, rapid sequencing and was used for sequencing Ebola virus during the 2014 outbreak2,4. By coupling selective depletion and cDNA synthesis with tagmentation library construction, the overall process time was reduced by ~2 days from previous adapter ligation methods. More recently, this protocol was employed by international collaborators and others with great success15,16 and will be deployed to labs in West Africa to support local genomics-based research studies and diagnostics17.
The protocol described here uses random primers to prepare cDNA for viral RNA-seq libraries. Unlike previous viral RNA-seq approaches, it requires no a priori knowledge of sequence data or elaborate and time-consuming primer design for a specific virus or clade. The method can be applied to any viral RNA sample. For example, it was used to generate viral content from both Ebola and Lassa samples6. The protocol may also be used for host transcriptomic, metagenomic and pathogen discovery sequencing projects 1.
A critical step of the protocol is targeted RNase H digestion, a high-throughput, low cost method for removing unwanted carrier and host RNA from viral samples. The selective depletion step of the protocol uses many components and requires skill and accuracy. Extra time and care should be taken during the initial setup.
As most clinical serum and plasma samples often have very little nucleic acid material, contamination and sample loss are common. To avoid these issues, special care should be taken when using this protocol. First, RNA is highly susceptible to degradation; therefore all areas should be clean and free of nucleases. Second, to identify samples suitable for use in this protocol, qRT-PCR assays for both host RNA and virus should be used for quantification5,6. When comparing input amounts with sequencing results from the protocol, sequencing success (i.e., generation of sufficient data for full viral assembly) correlated with samples that contained at least 100 pg total RNA and 1,000 copies of virus. Third, exposure to environmental sources of nucleic acids should be avoided. The protocol outlined here is done in a biosafety cabinet for safety precautions and for limiting environmental contaminants. Moreover, our group and others have noticed that commercial enzymes may be another source of contaminating bacterial nucleic acids in low input samples6,18. Use of a clean workspace (e.g., PCR hood, biosafety cabinet) and negative controls (e.g., water or buffer) will help alleviate and track contamination, respectively. For samples with <100 pg of total RNA, only poly(rA) carrier RNA, not rRNA, should be depleted to ensure high quality sequencing results while limiting loss of material. For very low input samples, cDNA-amplification methods may be more suitable19, although poly(rA) carrier should be removed prior to the cDNA synthesis.
The depletion of host rRNA enriches for viral content in sequencing libraries and is applicable to different sample collections including serum or plasma, and multiple types of tissues from rodents and non-human primates5,6. In non-human organisms, reads aligning to 28S rRNA remained after depletion, suggesting 28S rRNA is less conserved between humans and other species6,20. When using this method with non-human isolates, it may be necessary to supplement with DNA oligos complementary to the divergent rRNA sequences of the specific host 3,21.
Since the protocol is unbiased, viral reads may represent only a small fraction of total library content. Although rRNA is the most abundant species of host RNA and only a small percentage of rRNA reads (<1%) are found after selective depletion, all other host RNA (e.g., mRNA) will remain after depletion and may account for many sequencing reads from the sample. Therefore "oversampling" (i.e., oversequencing) individual libraries is required in order to have enough coverage for viral assembly and variant calls. For our studies, we attempt to sequence ~20 million reads per sample to have enough depth for analysis of viral genomic and associated variants as well as metagenomic content2,5. For metagenomic and pathogen discovery studies, it is important to note that contaminating host DNA is removed by DNase digestion. Therefore viruses and other pathogens that contain DNA genomes may be lost during the process, however RNA intermediates may still be sequenced.
The authors have nothing to disclose.
This work has been funded in part with Federal funds from the National Institutes of Health, Office of Director, Innovator (No.: DP2OD06514) (PCS) and from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contracts (No:HHSN272200900018C, HHSN272200900049C and U19AI110818).
96-Well PCR Plates | VWR | 47743-953 | |
Strips of Eight Caps | VWR | 47745-512 | |
Nuclease-free water | Ambion | AM9937 | 50 ml bottle |
TURBO DNase | Ambion | AM2238 | post RNA extraction step, 2 U/µL, buffer included |
PCR cycler | any PCR cyclers | ||
Agencourt RNAClean XP SPRI beads | Beckman Coulter Genomics | A63987 | beads for RNA cleanup |
Real Time qPCR system | any system | ||
DynaMag-96 Side Skirted Magnet | Invitrogen | 12027 | |
70% Ethanol | prepare fresh | ||
qRT-PCR primers | IDT DNA | see Table 2 | |
5M NaCl | Ambion | AM9760G | |
1M Tris-HCl pH 7.4 | Sigma | T2663-1L | |
1M Tris-HCl pH 7.5 | Invitrogen | 15567-027 | |
1M MgCl2 | Ambion | AM9530G | |
Linear acrylamide | Ambion | AM9520 | |
DNA oligos covering entire rRNA region | IDT DNA | see Table 3, order lab-ready at 100 µM | |
Oligo (dT) | IDT DNA | 40 nt long, desalted | |
Hybridase Thermostable RNase H | Epicentre | H39100 | |
RNase-free DNase Kit | Qiagen | 79254 | post selective depletion step |
SUPERase-In RNase Inhibitor | Ambion | AM2694 | |
Random Primers | Invitrogen | 48190-011 | mostly hexamers |
10 mM dNTP mix | New England Biolabs | N0447L | |
SuperScript III Reverse Transcriptase | Invitrogen | 18080-093 | with first-strand buffer, DTT |
Air Incubator | any air incubator cyclers | ||
NEBNext Second Strand Synthesis (dNTP-free) Reaction Buffer | New England Biolabs | B6117S | 10x |
E. coli DNA Ligase | New England Biolabs | M0205L | 10 U/μl |
E. coli DNA Polymerase I | New England Biolabs | M0209L | 10 U/μl |
E. coli RNase H | New England Biolabs | M0297L | 2 U/μl |
0.5M EDTA | Ambion | AM9261 | |
Agencourt AMPure XP SPRI beads | Beckman Coulter Genomics | A63881 | beads for DNA cleanup |
Elution Buffer | Qiagen | 10 mM Tris HCl, pH 8.5 | |
Quant-iT dsDNA HS Assay Kit | Invitrogen | Q32854 | |
Qubit fluorometer | Invitrogen | Q32857 | |
Nextera XT DNA Sample Prep Kit | Illumina | FC-131-1096 | |
Nextera XT DNA Index Kit | Illumina | FC-131-1001 | |
Tapestation 2200 | Agilent | G2965AA | |
High Sensitivity D1000 reagents | Agilent | 5067-5585 | |
High Sensitivity D1000 ScreenTape | Agilent | 5067-5584 | |
BioAnalyzer 2100 | Agilent | G2939AA | |
High Sensitivity DNA reagents | Agilent | 5067-4626 | |
Library Quantification Complete kit (Universal) | Kapa Biosystems | KK4824 | alternative to tapestation, bioanalyzer for library quantification |