Many biologically important RNA structures are conserved in evolution leading to characteristic mutational patterns. RNAalifold is a widely used program to predict consensus secondary structures in multiple alignments by combining evolutionary information with traditional energy-based RNA folding algorithms. Here we describe the theory and applications of the RNAalifold algorithm. Consensus secondary structure prediction not only leads to significantly more accurate structure models, but it also allows to study structural conservation of functional RNAs.
G-quadruplexes are abundant locally stable structural elements in nucleic acids. The combinatorial theory of RNA structures and the dynamic programming algorithms for RNA secondary structure prediction are extended here to incorporate G-quadruplexes using a simple but plausible energy model. With preliminary energy parameters, we find that the overwhelming majority of putative quadruplex-forming sequences in the human genome are likely to fold into canonical secondary structures instead. Stable G-quadruplexes are strongly enriched, however, in the 5Ê¹UTR of protein coding mRNAs.
G-quadruplexes are abundant locally stable structural elements in nucleic acids. The combinatorial theory of RNA structures and the dynamic programming algorithms for RNA secondary structure prediction are extended here to incorporate G-quadruplexes using a simple but plausible energy model. With preliminary energy parameters we find that the overwhelming majority of putative quadruplex-forming sequences in the human genome are likely to fold into canonical secondary structures instead. Stable G-quadruplexes are strongly enriched, however, in the 5 UTR of protein coding mRNAs.
The overwhelming majority of small nucleolar RNAs (snoRNAs) fall into two clearly de?ned classes characterized by distinctive secondary structures and sequence motifs. A small group of diverse ncRNAs, however, shares the hallmarks of one or both classes of snoRNAs but di?ers substantially from the norm in some respects. Here, we compile the available information on these exceptional cases, conduct a thorough homology search throughout the available metazoan genomes, provide improved and expanded alignments, and investigate the evolutionary histories of these ncRNA families as well as their mutual relationships.
Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties.
The prediction of RNA structure can be a first important step for the functional characterization of novel ncRNAs. Especially for the very meaningful secondary structure, there is a multitude of computational prediction tools. They differ not only in algorithmic details and the underlying models but also in what exactly they are trying to predict. This chapter gives an overview of different programs that aim to predict RNA secondary structure. We will introduce the ViennaRNA software package and web server as a solution that implements most of the varieties of RNA secondary structure prediction that have been published over the years. We focus on algorithms going beyond the mere prediction of a static structure.
RNA secondary structure contains many non-canonical base pairs of different pair families. Successful prediction of these structural features leads to improved secondary structures with applications in tertiary structure prediction and simultaneous folding and alignment.
The accessibility of RNA binding motifs controls the efficacy of many biological processes. Examples are the binding of miRNA, siRNA or bacterial sRNA to their respective targets. Similarly, the accessibility of the Shine-Dalgarno sequence is essential for translation to start in prokaryotes. Furthermore, many classes of RNA binding proteins require the binding site to be single-stranded.
Chinese hamster ovary (CHO) cells are the predominant cell factory for the production of recombinant therapeutic proteins. Nevertheless, the lack in publicly available sequence information is severely limiting advances in CHO cell biology, including the exploration of microRNAs (miRNA) as tools for CHO cell characterization and engineering. In an effort to identify and annotate both conserved and novel CHO miRNAs in the absence of a Chinese hamster genome, we deep-sequenced small RNA fractions of 6 biotechnologically relevant cell lines and mapped the resulting reads to an artificial reference sequence consisting of all known miRNA hairpins. Read alignment patterns and read count ratios of 5 and 3 mature miRNAs were obtained and used for an independent classification into miR/miR* and 5p/3p miRNA pairs and discrimination of miRNAs from other non-coding RNAs, resulting in the annotation of 387 mature CHO miRNAs. The quantitative content of next-generation sequencing data was analyzed and confirmed using qPCR, to find that miRNAs are markers of cell status. Finally, cDNA sequencing of 26 validated targets of miR-17-92 suggests conserved functions for miRNAs in CHO cells, which together with the now publicly available sequence information sets the stage for developing novel RNAi tools for CHO cell engineering.
Retroviruses and many retrotransposons are flanked by sequence repeats called long terminal repeats (LTRs). These sequences contain a promoter region, which is active in the 5 LTR, and transcription termination signals, which are active in the LTR copy present at the 3 end. A section in the middle of the LTR, called Redundancy region, occurs at both ends of the mRNA. Here we show that in the copia type retrotransposon Tto1, the promoter and terminator functions of the LTR can be supplied by heterologous sequences, thereby converting the LTR into a significantly shorter sub-terminal repeat. An engineered Tto1 element with 125 instead of the usual 574 base pairs repeated in the 5 and 3 region can still promote strand transfer during cDNA synthesis, defining a minimal Redundancy region for this element. Based on this finding, we propose a model for first strand transfer of Tto1.
Stem-bulge RNAs (sbRNAs) are a group of small, functionally yet uncharacterized noncoding RNAs first described in C. elegans, with a few homologous sequences postulated in C. briggsae. In this study, we report on a comprehensive survey of this ncRNA family in the phylum Nematoda. Employing homology search strategies based on both sequence and secondary structure models and a computational promoter screen we identified a total of 240 new sbRNA homologs. For the majority of these loci we identified both promoter regions and transcription termination signals characteristic for pol-III transcripts. Sequence and structure comparison with known RNA families revealed that sbRNAs are homologs of vertebrate Y RNAs. Most of the sbRNAs show the characteristic Ro protein binding motif, and contain a region highly similar to a functionally required motif for DNA replication previously thought to be unique to vertebrate Y RNAs. The single Y RNA that was previously described in C. elegans, however, does not show this motif, and in general bears the hallmarks of a highly derived family member.
Up to 450,000 non-coding RNAs (ncRNAs) have been predicted to be transcribed from the human genome. However, it still has to be elucidated which of these transcripts represent functional ncRNAs. Since all functional ncRNAs in Eukarya form ribonucleo-protein particles (RNPs), we generated specialized cDNA libraries from size-fractionated RNPs and validated the presence of selected ncRNAs within RNPs by glycerol gradient centrifugation. As a proof of concept, we applied the RNP method to human Hela cells or total mouse brain, and subjected cDNA libraries, generated from the two model systems, to deep-sequencing. Bioinformatical analysis of cDNA sequences revealed several hundred ncRNP candidates. Thereby, ncRNAs candidates were mainly located in intergenic as well as intronic regions of the genome, with a significant overrepresentation of intron-derived ncRNA sequences. Additionally, a number of ncRNAs mapped to repetitive sequences. Thus, our RNP approach provides an efficient way to identify new functional small ncRNA candidates, involved in RNP formation.
Reliable structure prediction is a prerequisite for most types of bioinformatical analysis of RNA. Since the accuracy of structure prediction from single sequences is limited, one often resorts to computing the consensus structure for a set of related RNA sequences. Since functionally important RNA structures are expected to evolve much more slowly than the underlying sequences, the pattern of sequence (co-)variation can be exploited to dramatically improve structure prediction. Since a conserved common structure is only expected when the RNA structure is under selective pressure, consensus structure prediction also provides an ideal starting point for the de novo detection of structured non-coding RNAs. Here, we review different strategies for the prediction of consensus secondary structures, and show how these approaches can be used to predict non-coding RNA genes.
Burkitt lymphoma is a mature aggressive B-cell lymphoma derived from germinal center B cells. Its cytogenetic hallmark is the Burkitt translocation t(8;14)(q24;q32) and its variants, which juxtapose the MYC oncogene with one of the three immunoglobulin loci. Consequently, MYC is deregulated, resulting in massive perturbation of gene expression. Nevertheless, MYC deregulation alone seems not to be sufficient to drive Burkitt lymphomagenesis. By whole-genome, whole-exome and transcriptome sequencing of four prototypical Burkitt lymphomas with immunoglobulin gene (IG)-MYC translocation, we identified seven recurrently mutated genes. One of these genes, ID3, mapped to a region of focal homozygous loss in Burkitt lymphoma. In an extended cohort, 36 of 53 molecularly defined Burkitt lymphomas (68%) carried potentially damaging mutations of ID3. These were strongly enriched at somatic hypermutation motifs. Only 6 of 47 other B-cell lymphomas with the IG-MYC translocation (13%) carried ID3 mutations. These findings suggest that cooperation between ID3 inactivation and IG-MYC translocation is a hallmark of Burkitt lymphomagenesis.
While there are numerous programs that can predict RNA or DNA secondary structures, a program that predicts RNA/DNA hetero-dimers is still missing. The lack of easy to use tools for predicting their structure may be in part responsible for the small number of reports of biologically relevant RNA/DNA hetero-dimers.
Myzostomida comprise a group of marine worms associated mainly with echinoderms since the Carboniferous. Due to their unusual morphology the phylogenetic position in relation to other Lophotrochozoa is discussed since their description. According to different morphological and molecular markers the Myzostomida are either close to Platyzoa or Annelida. Here we investigated small non-coding RNAs of Myzostoma cirriferum to infer the phylogenetic position of myzostomids. Based on transcriptomic data collected by Illumina Deep Sequencing we analyzed the microRNA (miRNA) families occurring in M. cirriferum. Phylogenetic analysis revealed the presence of 13 miRNA-families exclusively shared by Annelida (including Sipuncula) and Myzostomida, as such highly significantly supporting an annelid origin of myzostomids. Furthermore, using a mapping-approach and secondary structure models we predicted several miRNA-candidates unique for myzostomids.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.