Rapid technological advances have led to an explosion of biomedical data in recent years. The pace of change has inspired new collaborative approaches for sharing materials and resources to help train life scientists both in the use of cutting-edge bioinformatics tools and databases and in how to analyse and interpret large datasets. A prototype platform for sharing such training resources was recently created by the Bioinformatics Training Network (BTN). Building on this work, we have created a centralized portal for sharing training materials and courses, including a catalogue of trainers and course organizers, and an announcement service for training events. For course organizers, the portal provides opportunities to promote their training events; for trainers, the portal offers an environment for sharing materials, for gaining visibility for their work and promoting their skills; for trainees, it offers a convenient one-stop shop for finding suitable training resources and identifying relevant training events and activities locally and worldwide. Availability and implementation: http://mygoblet.org/training-portal CONTACT: email@example.com.
This chapter gives an overview over the current methods for automated modeling of RNA structures, with emphasis on template-based methods. The currently used approaches to RNA modeling are presented with a side view on the protein world, where many similar ideas have been used. Two main programs for automated template-based modeling are presented: ModeRNA assembling structures from fragments and MacroMoleculeBuilder performing a simulation to satisfy spatial restraints. Both approaches have in common that they require an alignment of the target sequence to a known RNA structure that is used as a modeling template. As a way to find promising template structures and to align the target and template sequences, we propose a pipeline combining the ParAlign and Infernal programs on RNA family data from Rfam. We also briefly summarize template-free methods for RNA 3D structure prediction. Typically, RNA structures generated by automated modeling methods require local or global optimization. Thus, we also discuss methods that can be used for local or global refinement of RNA structures.
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available.
We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks.
Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of high-throughput biology, the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections.
Metal ions are essential for the folding of RNA molecules into stable tertiary structures and are often involved in the catalytic activity of ribozymes. However, the positions of metal ions in RNA 3D structures are difficult to determine experimentally. This motivated us to develop a computational predictor of metal ion sites for RNA structures.
Noncoding RNAs perform important roles in the cell. As their function is tightly connected with structure, and as experimental methods are time-consuming and expensive, the field of RNA structure prediction is developing rapidly. Here, we present a detailed study on using the ModeRNA software. The tool uses the comparative modeling approach and can be applied when a structural template is available and an alignment of reasonable quality can be performed. We guide the reader through the entire process of modeling Escherichia coli tRNA(Thr) in a conformation corresponding to the complex with an aminoacyl-tRNA synthetase (aaRS). We describe the choice of a template structure, preparation of input files, and explore three possible modeling strategies. In the end, we evaluate the resulting models using six alternative benchmarks. The ModeRNA software can be freely downloaded from http://iimcb.genesilico.pl/moderna/ under the conditions of the General Public License. It runs under LINUX, Windows and Mac OS. It is also available as a server at http://iimcb.genesilico.pl/modernaserver/. The models and the script to reproduce the study from this article are available at http://www.genesilico.pl/moderna/examples/.
Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers.
The diverse functional roles of non-coding RNA molecules are determined by their underlying structure. ModeRNA server is an online tool for RNA 3D structure modeling by the comparative approach, based on a template RNA structure and a user-defined target-template sequence alignment. It offers an option to search for potential templates, given the target sequence. The server also provides tools for analyzing, editing and formatting of RNA structure files. It facilitates the use of the ModeRNA software and offers new options in comparison to the standalone program.
Understanding the molecular mechanism of protein-RNA recognition and complex formation is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes by X-ray crystallography and nuclear magnetic resonance spectroscopy (NMR) is tedious and difficult. Alternatively, protein-RNA interactions can be predicted by computational methods. Although less accurate than experimental observations, computational predictions can be sufficiently accurate to prompt functional hypotheses and guide experiments, e.g. to identify individual amino acid or nucleotide residues. In this article we review 10 methods for predicting protein-RNA interactions, seven of which predict RNA-binding sites from protein sequences and three from structures. We also developed a meta-predictor that uses the output of top three sequence-based primary predictors to calculate a consensus prediction, which outperforms all the primary predictors. In order to fully cover the software for predicting protein-RNA interactions, we also describe five methods for protein-RNA docking. The article highlights the strengths and shortcomings of existing methods for the prediction of protein-RNA interactions and provides suggestions for their further development.
DNA is continuously exposed to many different damaging agents such as environmental chemicals, UV light, ionizing radiation, and reactive cellular metabolites. DNA lesions can result in different phenotypical consequences ranging from a number of diseases, including cancer, to cellular malfunction, cell death, or aging. To counteract the deleterious effects of DNA damage, cells have developed various repair systems, including biochemical pathways responsible for the removal of single-strand lesions such as base excision repair (BER) and nucleotide excision repair (NER) or specialized polymerases temporarily taking over lesion-arrested DNA polymerases during the S phase in translesion synthesis (TLS). There are also other mechanisms of DNA repair such as homologous recombination repair (HRR), nonhomologous end-joining repair (NHEJ), or DNA damage response system (DDR). This paper reviews bioinformatics resources specialized in disseminating information about DNA repair pathways, proteins involved in repair mechanisms, damaging agents, and DNA lesions.
RNA is a large group of functionally important biomacromolecules. In striking analogy to proteins, the function of RNA depends on its structure and dynamics, which in turn is encoded in the linear sequence. However, while there are numerous methods for computational prediction of protein three-dimensional (3D) structure from sequence, with comparative modeling being the most reliable approach, there are very few such methods for RNA. Here, we present ModeRNA, a software tool for comparative modeling of RNA 3D structures. As an input, ModeRNA requires a 3D structure of a template RNA molecule, and a sequence alignment between the target to be modeled and the template. It must be emphasized that a good alignment is required for successful modeling, and for large and complex RNA molecules the development of a good alignment usually requires manual adjustments of the input data based on previous expertise of the respective RNA family. ModeRNA can model post-transcriptional modifications, a functionally important feature analogous to post-translational modifications in proteins. ModeRNA can also model DNA structures or use them as templates. It is equipped with many functions for merging fragments of different nucleic acid structures into a single model and analyzing their geometry. Windows and UNIX implementations of ModeRNA with comprehensive documentation and a tutorial are freely available.
In analogy to proteins, the function of RNA depends on its structure and dynamics, which are encoded in the linear sequence. While there are numerous methods for computational prediction of protein 3D structure from sequence, there have been very few such methods for RNA. This review discusses template-based and template-free approaches for macromolecular structure prediction, with special emphasis on comparison between the already tried-and-tested methods for protein structure modeling and the very recently developed "protein-like" modeling methods for RNA. We highlight analogies between many successful methods for modeling of these two types of biological macromolecules and argue that RNA 3D structure can be modeled using "protein-like" methodology. We also highlight the areas where the differences between RNA and proteins require the development of RNA-specific solutions.
REPAIRtoire is the first comprehensive database resource for systems biology of DNA damage and repair. The database collects and organizes the following types of information: (i) DNA damage linked to environmental mutagenic and cytotoxic agents, (ii) pathways comprising individual processes and enzymatic reactions involved in the removal of damage, (iii) proteins participating in DNA repair and (iv) diseases correlated with mutations in genes encoding DNA repair proteins. REPAIRtoire provides also links to publications and external databases. REPAIRtoire contains information about eight main DNA damage checkpoint, repair and tolerance pathways: DNA damage signaling, direct reversal repair, base excision repair, nucleotide excision repair, mismatch repair, homologous recombination repair, nonhomologous end-joining and translesion synthesis. The pathway/protein dataset is currently limited to three model organisms: Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The DNA repair and tolerance pathways are represented as graphs and in tabular form with descriptions of each repair step and corresponding proteins, and individual entries are cross-referenced to supporting literature and primary databases. REPAIRtoire can be queried by the name of pathway, protein, enzymatic complex, damage and disease. In addition, a tool for drawing custom DNA-protein complexes is available online. REPAIRtoire is freely available and can be accessed at http://repairtoire.genesilico.pl/.
Delivering hands-on tutorials on bioinformatics software and web applications is a challenging didactic scenario. The main reason is that trainees have heterogeneous backgrounds, different previous knowledge and vary in learning speed. In this article, we demonstrate how multi-stage learning aids can be used to allow all trainees to progress at a similar speed. In this technique, the trainees can utilize cards with hints and answers to guide themselves self-dependently through a complex task. We have successfully conducted a tutorial for the molecular viewer PyMOL using two sets of learning aid cards. The trainees responded positively, were able to complete the task, and the trainer had spare time to respond to individual questions. This encourages us to conclude that multi-stage learning aids overcome many disadvantages of established forms of hands-on software training.
We present HepatoNet1, the first reconstruction of a comprehensive metabolic network of the human hepatocyte that is shown to accomplish a large canon of known metabolic liver functions. The network comprises 777 metabolites in six intracellular and two extracellular compartments and 2539 reactions, including 1466 transport reactions. It is based on the manual evaluation of >1500 original scientific research publications to warrant a high-quality evidence-based model. The final network is the result of an iterative process of data compilation and rigorous computational testing of network functionality by means of constraint-based modeling techniques. Taking the hepatic detoxification of ammonia as an example, we show how the availability of nutrients and oxygen may modulate the interplay of various metabolic pathways to allow an efficient response of the liver to perturbations of the homeostasis of blood compounds.
As bioinformatics becomes increasingly central to research in the molecular life sciences, the need to train non-bioinformaticians to make the most of bioinformatics resources is growing. Here, we review the key challenges and pitfalls to providing effective training for users of bioinformatics services, and discuss successful training strategies shared by a diverse set of bioinformatics trainers. We also identify steps that trainers in bioinformatics could take together to advance the state of the art in current training practices. The ideas presented in this article derive from the first Trainer Networking Session held under the auspices of the EU-funded SLING Integrating Activity, which took place in November 2009.
Mathematical analysis and modeling of biochemical reaction networks requires knowledge of the permitted directionality of reactions and membrane transport processes. This information can be gathered from the standard Gibbs energy changes (DeltaG(0)) of reactions and the concentration ranges of their reactants. Currently, experimental DeltaG(0) values are not available for the vast majority of cellular biochemical processes. We propose what we believe to be a novel computational method to infer the unknown DeltaG(0) value of a reaction from the known DeltaG(0) value of the chemically most similar reaction. The chemical similarity of two arbitrary reactions is measured by the relative number (T) of co-occurring changes in the chemical attributes of their reactants. Testing our method across a validated reference set of 173 biochemical reactions with experimentally determined DeltaG(0) values, we found that a minimum reaction similarity of T = 0.6 is required to infer DeltaG(0) values with an error of <10 kJ/mol. Applying this criterion, our method allows us to assign DeltaG(0) values to 458 additional reactions of the BioPath database. We believe our approach permits us to minimize the number of DeltaG(0) measurements required for a full coverage of a given reaction network with reliable DeltaG(0) values.
The packing of protein atoms is an indicator for their stability and functionality, and applied in determining thermostability, in protein design, ligand binding and to identify flexible regions in proteins. Here, we present Voronoia, a database of atomic-scale packing data for protein 3D structures. It is based on an improved Voronoi Cell algorithm using hyperboloid interfaces to construct atomic volumes, and to resolve solvent-accessible and -inaccessible regions of atoms. The database contains atomic volumes, local packing densities and interior cavities calculated for 61 318 biological units from the PDB. A report for each structure summarizes the packing by residue and atom types, and lists the environment of interior cavities. The packing data are compared to a nonredundant set of structures from SCOP superfamilies. Both packing densities and cavities can be visualized in the 3D structures by the Jmol plugin. Additionally, PDB files can be submitted to the Voronoia server for calculation. This service performs calculations for most full-atomic protein structures within a few minutes. For batch jobs, a standalone version of the program with an optional PyMOL plugin is available for download. The database can be freely accessed at: http://bioinformatics.charite.de/voronoia.
MODOMICS, a database devoted to the systems biology of RNA modification, has been subjected to substantial improvements. It provides comprehensive information on the chemical structure of modified nucleosides, pathways of their biosynthesis, sequences of RNAs containing these modifications and RNA-modifying enzymes. MODOMICS also provides cross-references to other databases and to literature. In addition to the previously available manually curated tRNA sequences from a few model organisms, we have now included additional tRNAs and rRNAs, and all RNAs with 3D structures in the Nucleic Acid Database, in which modified nucleosides are present. In total, 3460 modified bases in RNA sequences of different organisms have been annotated. New RNA-modifying enzymes have been also added. The current collection of enzymes includes mainly proteins for the model organisms Escherichia coli and Saccharomyces cerevisiae, and is currently being expanded to include proteins from other organisms, in particular Archaea and Homo sapiens. For enzymes with known structures, links are provided to the corresponding Protein Data Bank entries, while for many others homology models have been created. Many new options for database searching and querying have been included. MODOMICS can be accessed at http://genesilico.pl/modomics.
The structures of biological macromolecules provide a framework for studying their biological functions. Three-dimensional structures of proteins, nucleic acids, or their complexes, are difficult to visualize in detail on flat surfaces, and algorithms for their spatial superposition and comparison are computationally costly. Molecular structures, however, can be represented as 2D maps of interactions between the individual residues, which are easier to visualize and compare, and which can be reconverted to 3D structures with reasonable precision. There are many visualization tools for maps of protein structures, but few for nucleic acids.
Voronoia4RNA (http://proteinformatics.charite.de/voronoia4rna/) is a structural database storing precalculated atomic volumes, atomic packing densities (PDs) and coordinates of internal cavities for currently 1869 RNAs and RNA-protein complexes. Atomic PDs are a measure for van der Waals interactions. Regions of low PD, containing water-sized internal cavities, refer to local structure flexibility or compressibility. RNA molecules build up the skeleton of large molecular machineries such as ribosomes or form smaller flexible structures such as riboswitches. The wealth of structural data on RNAs and their complexes allows setting up representative data sets and analysis of their structural features. We calculated atomic PDs from atomic volumes determined by the Voronoi cell method and internal cavities analytically by Delaunay triangulation. Reference internal PD values were derived from a non-redundant sub-data set of buried atoms. Comparison of internal PD values shows that RNA is more tightly packed than proteins. Finally, the relation between structure size, resolution and internal PD of the Voronoia4RNA entries is discussed. RNA, protein structures and their complexes can be visualized by the Jmol-based viewer Provi. Variations in PD are depicted by a color code. Internal cavities are represented by their molecular boundaries or schematically as balls.
MODOMICS is a database of RNA modifications that provides comprehensive information concerning the chemical structures of modified ribonucleosides, their biosynthetic pathways, RNA-modifying enzymes and location of modified residues in RNA sequences. In the current database version, accessible at http://modomics.genesilico.pl, we included new features: a census of human and yeast snoRNAs involved in RNA-guided RNA modification, a new section covering the 5-end capping process, and a catalogue of building blocks for chemical synthesis of a large variety of modified nucleosides. The MODOMICS collections of RNA modifications, RNA-modifying enzymes and modified RNAs have been also updated. A number of newly identified modified ribonucleosides and more than one hundred functionally and structurally characterized proteins from various organisms have been added. In the RNA sequences section, snRNAs and snoRNAs with experimentally mapped modified nucleosides have been added and the current collection of rRNA and tRNA sequences has been substantially enlarged. To facilitate literature searches, each record in MODOMICS has been cross-referenced to other databases and to selected key publications. New options for database searching and querying have been implemented, including a BLAST search of protein sequences and a PARALIGN search of the collected nucleic acid sequences.
Bacterial ribosomes stalled at the 3 end of malfunctioning messenger RNAs can be rescued by transfer-messenger RNA (tmRNA)-mediated trans-translation. The SmpB protein forms a complex with the tmRNA, and the transfer-RNA-like domain (TLD) of the tmRNA then enters the A site of the ribosome. Subsequently, the TLD-SmpB module is translocated to the P site, a process that is facilitated by the elongation factor EF-G, and translation is switched to the mRNA-like domain (MLD) of the tmRNA. Accurate loading of the MLD into the mRNA path is an unusual initiation mechanism. Despite various snapshots of different ribosome-tmRNA complexes at low to intermediate resolution, it is unclear how the large, highly structured tmRNA is translocated and how the MLD is loaded. Here we present a cryo-electron microscopy reconstruction of a fusidic-acid-stalled ribosomal 70S-tmRNA-SmpB-EF-G complex (carrying both of the large ligands, that is, EF-G and tmRNA) at 8.3?Å resolution. This post-translocational intermediate (TI(POST)) presents the TLD-SmpB module in an intrasubunit ap/P hybrid site and a tRNA(fMet) in an intrasubunit pe/E hybrid site. Conformational changes in the ribosome and tmRNA occur in the intersubunit space and on the solvent side. The key underlying event is a unique extra-large swivel movement of the 30S head, which is crucial for both tmRNA-SmpB translocation and MLD loading, thereby coupling translocation to MLD loading. This mechanism exemplifies the versatile, dynamic nature of the ribosome, and it shows that the conformational modes of the ribosome that normally drive canonical translation can also be used in a modified form to facilitate more complex tasks in specialized non-canonical pathways.
We report the results of a first, collective, blind experiment in RNA three-dimensional (3D) structure prediction, encompassing three prediction puzzles. The goals are to assess the leading edge of RNA structure prediction techniques; compare existing methods and tools; and evaluate their relative strengths, weaknesses, and limitations in terms of sequence length and structural complexity. The results should give potential users insight into the suitability of available methods for different applications and facilitate efforts in the RNA structure prediction community in ongoing efforts to improve prediction tools. We also report the creation of an automated evaluation pipeline to facilitate the analysis of future RNA structure prediction exercises.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.