Enter your email to receive a free trial:
1RT Biochemistry Section, HIV Drug Resistance Program, Frederick National Laboratory for Cancer Research
Enter your email to receive a free trial:
High-throughput selective 2' hydroxyl acylation analyzed by primer extension (SHAPE) utilizes a novel chemical probing technology, reverse transcription, capillary electrophoresis and secondary structure prediction software to determine the structures of RNAs from several hundred to several thousand nucleotides at single nucleotide resolution.
Keywords: Genetics, Issue 75, Molecular Biology, Biochemistry, Virology, Cancer Biology, Medicine, Genomics, Nucleic Acid Probes, RNA Probes, RNA, High-throughput SHAPE, Capillary electrophoresis, RNA structure, RNA probing, RNA folding, secondary structure, DNA, nucleic acids, electropherogram, synthesis, transcription, high throughput, sequencing
Lusvarghi, S., Sztuba-Solinska, J., Purzycka, K. J., Rausch, J. W., Le Grice, S. F. J. RNA Secondary Structure Prediction Using High-throughput SHAPE. J. Vis. Exp. (75), e50243, doi:10.3791/50243 (2013).
Understanding the function of RNA involved in biological processes requires a thorough knowledge of RNA structure. Toward this end, the methodology dubbed "high-throughput selective 2' hydroxyl acylation analyzed by primer extension", or SHAPE, allows prediction of RNA secondary structure with single nucleotide resolution. This approach utilizes chemical probing agents that preferentially acylate single stranded or flexible regions of RNA in aqueous solution Sites of chemical modification are detected by reverse transcription of the modified RNA, and the products of this reaction are fractionated by automated capillary electrophoresis (CE). Since reverse transcriptase pauses at those RNA nucleotides modified by the SHAPE reagents, the resulting cDNA library indirectly maps those ribonucleotides that are single stranded in the context of the folded RNA. Using ShapeFinder software, the electropherograms produced by automated CE are processed and converted into nucleotide reactivity tables that are themselves converted into pseudo-energy constraints used in the RNAStructure (v5.3) prediction algorithm. The two-dimensional RNA structures obtained by combining SHAPE probing with in silico RNA secondary structure prediction have been found to be far more accurate than structures obtained using either method alone.
To understand the functions of catalytic and non-coding RNAs involved in regulation of splicing, translation, virus replication and cancer, a detailed knowledge of RNA structure is required1,2. Unfortunately, accurate prediction of RNA folding presents a formidable challenge. Classical probing agents suffer from many disadvantages such as toxicity, incomplete nucleotide coverage and/or throughput limited to 100-150 nucleotides per experiment. Unaided secondary structure prediction algorithms are similarly disadvantageous, owing to inaccuracies resulting from their inability to effectively distinguish among energetically similar structures. Large RNAs in particular are also often refractory to methods of 3D structure determination such as X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, due to their conformational flexibility and large quantities of highly pure samples required for these techniques.
High-throughput SHAPE solves many of these problems by providing an effective, simple approach to probing the structures of large RNAs at single-nucleotide resolution. Moreover, the reagents used for SHAPE are safe, easy to handle and, in contrast to most other chemical probing reagents, react with all four ribonucleotides. These reagents can also penetrate cellular membranes, making it possible to probe RNAs in their in vivo context(s)3. Originally developed in the Weeks laboratory4, SHAPE has been used to analyze a wide variety of RNAs, the most notable example being determination of the complete secondary structure of the ~9 kb HIV-1 RNA genome5. Other notable achievements using SHAPE include elucidation of the structures of infectious viroids6, human long non-coding RNAs7, yeast ribosomes8, and riboswitches9 as well as to identify protein binding sites in virion-associated HIV-1 RNA3. While the original and high-throughput variations of the SHAPE protocol have been published elsewhere10-12, the present work provides a detailed description of RNA secondary structure determination by high-throughput SHAPE using fluorescent oligonucleotides, the Beckman Coulter CEQ 8000 Genetic Analyzer, and SHAPEfinder and RNAStructure (v5.3) software. Previously unpublished technical details and troubleshooting advice are also included.
Variations of SHAPE
The essence of SHAPE and its variations is exposure of RNA in aqueous solution to electrophilic anhydrides that selectively acylate 2'-hydroxyl (2'-OH) ribose groups, producing bulky adducts at the sites of modification. This chemical reaction serves as a means of interrogating local RNA structural dynamics, as single-stranded nucleotides are more prone to adopt conformations conducive to electrophilic attack by these reagents, while base paired or architecturally constrained nucleotides are less or unreactive10. Sites of adduct formation are detected by reverse transcription initiating from fluorescently or radiolabeled primers hybridized to a specific site on the modified RNA (the "(+)" primer extension reaction). When reverse transcriptase (RT) fails to traverse the acylated ribonucleotides, a pool of cDNA products is produced whose lengths coincide with sites of modification. A control, "(-)" primer extension reaction utilizing RNA that has not been exposed to reagent is also performed so that premature termination of DNA synthesis (i.e. "stops") due to RNA structure, nonspecific RNA strand breakage, etc., may be distinguished from pausing produced by chemical modification. Finally, two dideoxy-sequencing reactions initiating from the same primers are used as markers to correlate reactive nucleotides with the RNA primary sequence following electrophoresis.
In the original application of SHAPE, the same 32P-end-labeled primer is utilized for the (+), (-), and two sequencing reactions. Products of these reactions are loaded into adjacent wells in a 5-8% polyacrylamide slab gel, and fractionated by denaturing polyacrylamide gel electrophoresis (PAGE; Figure 1). Quantitative analysis of the gel images produced by conventional SHAPE can be performed using SAFA, a semi-automated footprinting analysis software13.
In contrast, high-throughput SHAPE employs fluorescently labeled primers and automated capillary electrophoresis. Specifically, for each region of RNA under investigation, a set of four DNA primers having a common sequence but different 5' fluorescent labels must be synthesized or purchased. These differently-labeled oligonucleotides serve to prime two SHAPE reactions and two sequencing reactions, the products of which are pooled and fractionated/detected by automated capillary electrophoresis (CE). Whereas the reactivity profile of 100-150 nt of RNA can be obtained from a set of four reactions using the original approach, high-throughput SHAPE allows resolution of 300-600 nt from a single pooled sample3. Up to 8 sets of reactions may be fractionated simultaneously, while as many as 96 samples can prepared for fractionation over the course of 12 consecutive CE runs (Figure 2). Moreover, the SHAPEfinder software, developed to process and analyze data emerging from the CEQ and other genetic analyzers, is more automated and require much less user intervention than SAFA13 or other gel-analysis packages.
More advanced high-throughput methodologies have recently emerged such as PARS (parallel analysis of RNA structure)14 and Frag-Seq (fragment-sequencing)15, which use structure-specific enzymes rather than alkylation reagents in conjunction with next generation sequencing techniques to obtain information about RNA structure. Despite the attractiveness of these techniques, the many limitations inherent to nuclease probing still remain16. These problems can be circumvented in the SHAPE sequencing (SHAPE-Seq)17 protocol, where next generation sequencing is preceded by chemical modification and reverse transcription of RNAs in a manner similar to that performed in conventional SHAPE. While these methods may represent the future of RNA structure determination, it is important to remember that next generation sequencing is very expensive, and remains unavailable to many laboratories.
SHAPE Data Analysis
Data produced in the genetic analyzer is presented in the form of an electropherogram, wherein the fluorescence intensity of the sample(s) flowing through the capillary detector is plotted against an index of migration time. This plot takes the form of overlapping traces corresponding to the four fluorescence channels used to detect the different fluorophores, and where each trace is comprised of peaks corresponding to individual cDNA or sequencing products. Electropherogram data is exported from the genetic analyzer as a tab-delimited text file and imported into ShapeFinder transformation and analysis software18.
ShapeFinder is initially used to perform a series of mathematical transformations on the data to ensure that migration times and peak volumes accurately reflect the identities and quantities of the reaction products, respectively. Peaks are then aligned and integrated, and the results tabulated together with the primary RNA sequence. A "reactivity profile" for the pertinent segment of RNA is obtained by subtracting control values from the (+) values associated with each RNA nucleotide, and normalizing the data as described below. This profile is imported into RNAstructure (v5.3) software19,20 , which converts the normalized reactivity values into pseudo-energy constraints that are incorporated into the RNA secondary structure folding algorithm. Combining chemical probing and folding algorithms in this way significantly improves the accuracy of structure prediction compared to either method alone12,21. The output of RNAstructure (v5.3) includes images of the lowest energy RNA secondary structures color-coded with the SHAPE reactivity profile(s), as well as the same structures in textual dot-bracket notation. The latter may subsequently be exported to software dedicated to the graphical display of RNA secondary structure such as Varna22 and PseudoViewer23.
Figure 1. Flowchart of RNA structure determination via SHAPE4,10. (A) RNA may be obtained from biological samples or by in vitro transcription. (B) Depending on the source, RNA is folded or otherwise processed and modified with SHAPE reagent. (C) Reverse transcription using fluorescently or radioactively labeled primers. (D) cDNA products are fractionated via either capillary or slab gel-based electrophoresis. (E) Fragment analysis. (F) RNA structure prediction. Click here to view larger figure.
Figure 2. The high-throughput character of CE-based SHAPE allows rapid analysis of multiple RNAs, and/or multiple segments of the same RNA. (A) Represents how an RNA may be divided into 300-600 nt sections (color coded in green, blue and red) (B) Sections of the RNA are probed independently using different sets of fluorescent primers (black arrows) (C) Sets of reactions are pooled and loaded into wells A1, B1, C1, etc, respectively, providing complete coverage for the ~3 kb RNA1. Reaction products from RNAs 2, 3, 4, etc. may be similarly prepared for fractionation in consecutive electrophoretic runs. Click here to view larger figure.
Primer design and extension of the RNA 3' terminus
To analyze long RNAs by high-throughput SHAPE, a series of primer hybridization sites should be selected such that they (i) are separated by ~300 nt, (ii) are 20-30 nt in length, and (iii) that RNA/DNA hybrids produced by annealing DNA to these sites have an expected melting temperature of >50 °C. In addition, segments of RNA that are predicted to be highly structured should be avoided, although making such a determination requires some foreknowledge of the RNA structure, which is often unavailable. DNA primers that hybridize to these sites should then be designed, taking care to ensure that they would not be expected to form stable dimers or intrastrand secondary structures.
Once designed, primer sets must be either purchased (e.g. from Integrated DNA Technologies, Ames, Iowa) or synthesized24,25. Primers 5'-labeled with Cy5, Cy5.5, WellRedD2 (Beckman Coulter) and IRDye800 (Lycor) /WellRedD1 (Beckman Coulter) are best suited for the Beckman Coulter 8000 CEQ, providing good signal intensity while minimizing crosstalk. Labeled oligonucleotides may be stored indefinitely in small, 10 μM aliquots at -20 °C; avoid repeated freeze/thaw cycles.
By using primers designed in this manner, it is possible to obtain SHAPE data for virtually an entire RNA of any length. However, the sequence at or near the 3' terminus of an RNA is always inaccessible to SHAPE, unless the RNA is engineered to contain a 3' terminal extension (e.g. a "structure cassette") to which a primer may be hybridized4.
RNA Preparation through Capillary Electrophoresis
Although RNAs from biological samples may be utilized for high-throughput SHAPE, the protocol given here is optimized for RNA produced by in vitro transcription. Commercial transcription kits such as MegaShortScript (Ambion) used in conjunction with MegaClear RNA purification columns (Ambion) are well suited to generating large amounts of pure RNA. RNAs should be stored in TE buffer between -20 °C and -80 °C For best results, RNAs should appear homogeneous by both denaturing and non-denaturing polyacrylamide gel electrophoresis.
1. RNA Folding
2. Chemical Modification of the RNA
Well characterized, electrophilic SHAPE reagents include isatoic anhydride (IA), N-methylisatoic anhydride (NMIA), 1-methyl-7-nitro-isatoic anhydride (1M7)26, and benzoyl cyanide (BzCN)27. Of these, the most commonly used for high-throughput SHAPE are 1M7 and NMIA, and only the latter is commercially available (Life Technologies). The final concentration of modifying reagent must be optimized for each RNA to obtain "single-hit" modification kinetics, i.e. the condition in which most RNAs in solution are modified once in the region of RNA being analyzed11. This optimum concentration can be determined by performing multiple reactions in which the concentration of reagent is varied across the range(s) indicated in the table in Section 2.1 below. Use the concentration of reagent that produces an easily detectable signal while minimizing the difference in signal intensity between long and short DNA synthesis products (e.g. Figure 3).
Figure 3. SHAPE electropherograms produced from a ~360 nt RNA treated with (A) 0 (B) 2.5 mM or (C) 10 mM 1M7. All electropherograms are displayed on the same scale. Blue, green, red and black traces correspond to (+) reaction products (Cy5), (-) reaction products (Cy5.5), and the two sequencing ladders (WellRed D2 and IRDye800), respectively. The RNA used to produce image (B) has been treated with the optimal amount of 1M7, demonstrating good peak resolution and intensity, with minimal signal decay throughout the trace (left). Read length is maximal under these conditions. In contrast, the absence of medium intensity, well resolved peaks in (A) suggests a sub-optimal concentration of 1M7. Conversely, the signal decay evident in (C) indicates that single hit kinetics is not observed, and the RNA is over-modified. In such cases, especially when RT would not be expected to encounter the 5' terminus of the RNA template, read length will be suboptimal.
|Reagent||Optimum 10X concentration (in DMSO)||Time to complete degradation of reagent27|
|NMIA||10-100 mM||~20 min|
|1M7||10-50 mM||70 sec|
3. Reverse Transcription
This step generates the fluorescently-labeled cDNA products that are used to indirectly identify the degree to which RNA nucleotides have been modified by a SHAPE reagent. For SHAPE, the performance of Superscript III (Invitrogen) RT was superior to all other RTs tested, and is the enzyme chosen for use with this protocol. Oligonucleotides labeled with Cy5 and Cy5.5 are used to prime the (+) and (-) reactions, respectively. For shorter RNAs, primers are hybridized to a 3' terminal extension of the native RNA (e.g. a "structure cassette") in order to obtain information about the 3' terminus4. Attention: From this point through CE, samples should be protected from light.
4. Preparation of Sequencing Ladder
Sequencing ladders serve as markers for determining nucleotide position during data processing. These are generated using a USB Cycle Sequencing kit (#78500), DNA having the same sequence as the RNA being studied, and primers labeled with WellRed D2 or D1/Lycor 800. Typically, DNA employed in this reaction will be that used as a transcription template for the RNA in question. Although the reaction protocol presented here closely resembles that recommended by the kit manufacturer, the reaction is scaled up several fold. While ddA and ddT are used as chain terminators in the reactions described below, any pair of terminators may be used to generate the sequencing ladders.
5. Fractionation of Reaction Products by Capillary Electrophoresis
Capillary electrophoresis allows simultaneous separation of cDNA synthesis products from four reactions pooled into a single sample. Eight samples may be fractionated simultaneously, while as many as 96 samples may be fractionated during a single run (Figure 2).
Ideally, outside of primer and strong-stop peaks, signals for each peak in all four electropherogram traces should be in the linear range; a gradual drop-off in signal is acceptable. Sometimes, however, large peaks (stops) are evident even in the control reaction, and these can interfere with subsequent data processing. Truncated cDNAs that give rise to these peaks can be the result of a natural obstacle during reverse transcription (e.g. RNA secondary structure), or RNA degradation. In the former case, additives such as betaine might improve RT processivity and reduce RT pausing/premature termination.
ShapeFinder software allows the user to visualize and transform CE traces and convert them into SHAPE reactivity profiles18. Once reactivity values are tabulated, they are normalized and imported into RNAStructure (v5.3) to generate and refine secondary structural models.
6. ShapeFinder Software
An extension of the BaseFinder trace processing platform29, the published version of ShapeFinder is freely available for non-commercial use18. Detailed instructions for data handling in ShapeFinder are provided with the software documentation.
Note: The analysis of the data is critical for the accuracy of SHAPE, and some considerations are very important during this analysis, including:
7. Data Normalization
To incorporate nucleotide reactivity profiles into the secondary structure algorithm used by RNAStructure (v5.3) software, and/or to compare profiles of closely related RNAs, SHAPE data must be normalized in a standardized fashion12. This involves (i) excluding outliers from subsequent calculations, (ii) determining the "effective maximum" reactivity (i.e., the average of the highest 8% of reactivity value, excluding outliers), and (iii) normalization by dividing all reactivity values by the "effective maximum", as follows:
8. Data Modeling
RNAstructure (v5.3) software is used to predict experimentally-supported RNA secondary structure(s) using the pseudo-free energy constraints derived from SHAPE analysis19. The software provides graphical representations of the lowest energy 2D RNA structures as well as textual representation of these structures in dot-bracket notation. The latter can be imported in an RNA structure viewer of the user's preference, e.g. Pseudoviewer23 or Varna22, to produce publication-quality images.
Note: Care must be taken when considering the structures produced by the RNAstructure (v5.3) software. For example, the software cannot resolve tertiary interactions such as pseudoknots and kissing loops, nor can it distinguish whether lack of reactivity in a certain region is due to basepairing or steric protection by bound proteins. As a consequence, these factors, along with the energies reported for the individual structures, must be considered when presenting a definitive structural model.
RNA containing the HIV-1 rev response element (RRE) and a 3' terminal structure cassette4 was prepared from a linearized plasmid by in vitro transcription, after which it was folded by heating, cooling, and incubation at 37 °C in the presence of MgCl2. RNA was exposed to NMIA and then reverse transcribed from a 5'-end-labeled DNA primer hybridized to the 3' terminal structure cassette. The resulting SHAPE cDNA library, together with control and sequencing reactions, was then fractionated using the Beckman Coulter CEQ 8000 automated capillary electrophoresis system to produce the electropherogram depicted in Figure 4. The four overlapping, color coded traces are produced by migration of the four sets of reaction products through the capillary, as follows: Blue (Cy5-labeled RT products generated from reverse transcription of NMIA-modified RNAs), green (Cy5.5-labeled RT products from folded, but otherwise unmodified RNAs), black (WellRed D2-labeled DNA sequencing ladder generated using ddG) and red (Lycor800-labeled sequencing ladder, ddT).
The raw CE traces were separated, processed aligned and integrated using ShapeFinder software18. The region of the trace(s) corresponding to the RRE SLII region is depicted in Figure 5, the data having been processed to the point immediately prior to peak integration (i.e., traces have been aligned, subjected to background subtraction, correction for signal decay, etc.). Reactivity values for each nucleotide are calculated by integrating the corresponding peaks in the NMIA(+) and NMIA(-) reactions (in the blue and green traces, respectively), and subtracting the latter from the former. These reactivity values indicate the extent in which RT terminates at each nucleotide during reverse transcription, which is a reflection of the degree to which each nucleotide has been modified by NMIA and therefore its propensity to be single stranded in solution.
The HIV-1 RRE SHAPE reactivity profile generated in ShapeFinder was normalized and converted into a text document suitable for import into RNAStructure (v5.3)19. In the latter software, reactivity values were incorporated into the secondary structure prediction algorithm as pseudo-energy constraints, thereby influencing which structures are predicted to have the lowest free energy. Program output is comprised of two dimensional RNA secondary structure maps depicting the lowest energy structures generated by the algorithm as well as text files containing these structures expressed in dot-bracket notation. The latter of these outputs may be exported into RNA visualization software such as VARNA22. Figure 6 shows the 2D structure of the HIV RRE SLII region generated in RNAStructure (v5.3) using the SHAPE-derived reactivity profiles and visualized using VARNA. Color coded SHAPE reactivity values are superimposed.
Figure 4. Representative electropherogram of an RNA sample viewed in the CEQ 8000 Genetic Analysis System (Beckman). The software displays a trace where each channel is shown as a colored line: Blue and green channels represent the (+) reagent and (-) reagent experiments, and black and red represent sequencing ladders. X-and Y-axes denote resolution time and fluorescence intensity, respectively. Click here to view larger figure.
Figure 5. Electropherogram analysis using ShapeFinder tools18. The Data View Window (center) provides graphical feedback on each data processing step. The Tool Inspector window (bottom left) shows parameters for the tool selected in the Scripting Inspector. The Scripting Inspector (upper right) displays those tools that have been applied to the data. Click here to view larger figure.
Figure 6. Two-dimensional RNA structure color coded for SHAPE reactivity generated using the VARNA visualization applet22. Nucleotides are color-coded to indicate the degree(s) of SHAPE reactivity. In the spectrum shown, blue and red circles indicate nucleotides having low and high reactivity, respectively.
We present here a detailed protocol for high-throughput SHAPE, a technique that allows secondary structure determination to single-nucleotide resolution for RNAs of any size. Moreover, coupling experimental SHAPE data with secondary structure prediction algorithms facilitates generation of RNA 2D models with a higher degree of accuracy than is possible with either method alone. The combination of fluorescently-labeled primers and automated CE provides significant advantages over the traditional gel-based SHAPE, facilitating resolution of long RNA sequences in a single experiment, as well as substantially higher speed and throughput for multiple experiments. The expediency of this method and availability of suitable data analysis tools make SHAPE ideally-suited for structural analysis of previously intractable viral, intact messenger, and noncoding RNAs. As the 2D structures of these intriguing RNAs become clearer, the use of hydroxyl radical probing, through-space cleavage methodologies and molecular modeling should help elucidate complex tertiary interactions and eventually allow researchers to determine the structures of these RNAs in three dimensions.
No conflicts of interest declared.
S. Lusvarghi, J. Sztuba-Solinska, K.J. Purzycka, J.W. Rausch and S.F.J. Le Grice are supported by the Intramural Research Program of the National Cancer Institute, National Institutes of Health, USA.
|N-methylisatoic anhydride (NMIA)||Life technologies||M25||Dissolve in anhydrous DMSO|
|1-methyl-t-nitroisatoic anhydride (1M7)||see ref. 22|
|Superscript III Reverse Transcriptase||Life technologies||18080044||10,000 units|
|Thermo sequenase cycle sequencing kit||Affymetrix||78500|
|Materials provided by the user|
|RNA of interest||6 pmol per reaction (the limit of detection will be determined by the instrument)|
|Sets of four 5' labeled primers (Cy5, Cy5.5, WellRed D2 and WellRed D1/Licor IR800)||Primers are complementary to the RNA and are used in reverse transcription and sequencing reactions. The listed fluorophores are optimal for the Beckman Coulter 8000 CEQ. Primers may be purchased or synthesized in house.|
|DNA template||DNA is used for sequencing reactions, and must contain the sequence of the RNA being studied - including any 3'terminal extension, if present. Where applicable, it is often convenient to use the RNA transcription template.|
|10x RNA renaturation buffer||100 mM Tris-HCl pH 8.0, 1 M KCl, 1 mM EDTA|
|5X RNA folding buffer||200 mM Tris-HCl pH 8.0, 25 mM MgCl2, 2.5 mM EDTA, 650 mM KCl. (This buffer might be changed depending on the case (e.g. pH, EDTA, Mg, RNase inhibitor)|
|2.5X RT mix||4 μl 5X buffer, 1 μl 100 mM DTT, 1.5 μl water,1 μl 10 mM dNTPs, 0.5 μl SuperScript III. Note that the 5X buffer and 100 mM DTT are provided with purchase of SuperScript III (Invitrogen).|
|GenomeLab Sample Loading Solution (Beckman Coulter)||Attention: Avoid multiple freeze-thaw cycles|
You must be signed in to post a comment. Please sign in or create an account.