Many researchers, across incredibly diverse foci, are applying phylogenetics to their research question(s). However, many researchers are new to this topic and so it presents inherent problems. Here we compile a practical introduction to phylogenetics for nonexperts. We outline in a step-by-step manner, a pipeline for generating reliable phylogenies from gene sequence datasets. We begin with a user-guide for similarity search tools via online interfaces as well as local executables. Next, we explore programs for generating multiple sequence alignments followed by protocols for using software to determine best-fit models of evolution. We then outline protocols for reconstructing phylogenetic relationships via maximum likelihood and Bayesian criteria and finally describe tools for visualizing phylogenetic trees. While this is not by any means an exhaustive description of phylogenetic approaches, it does provide the reader with practical starting information on key software applications commonly utilized by phylogeneticists. The vision for this article would be that it could serve as a practical training tool for researchers embarking on phylogenetic studies and also serve as an educational resource that could be incorporated into a classroom or teaching-lab.
22 Related JoVE Articles!
A Protocol for Computer-Based Protein Structure and Function Prediction
Institutions: University of Michigan , University of Kansas.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server.
Biochemistry, Issue 57, On-line server, I-TASSER, protein structure prediction, function prediction
Multimodal Optical Microscopy Methods Reveal Polyp Tissue Morphology and Structure in Caribbean Reef Building Corals
Institutions: University of Illinois at Urbana-Champaign, University of Illinois at Urbana-Champaign, University of Illinois at Urbana-Champaign.
An integrated suite of imaging techniques has been applied to determine the three-dimensional (3D) morphology and cellular structure of polyp tissues comprising the Caribbean reef building corals Montastraeaannularis
and M. faveolata
. These approaches include fluorescence microscopy (FM), serial block face imaging (SBFI), and two-photon confocal laser scanning microscopy (TPLSM). SBFI provides deep tissue imaging after physical sectioning; it details the tissue surface texture and 3D visualization to tissue depths of more than 2 mm. Complementary FM and TPLSM yield ultra-high resolution images of tissue cellular structure. Results have: (1) identified previously unreported lobate tissue morphologies on the outer wall of individual coral polyps and (2) created the first surface maps of the 3D distribution and tissue density of chromatophores and algae-like dinoflagellate zooxanthellae
endosymbionts. Spectral absorption peaks of 500 nm and 675 nm, respectively, suggest that M. annularis
and M. faveolata
contain similar types of chlorophyll and chromatophores. However, M. annularis
and M. faveolata
exhibit significant differences in the tissue density and 3D distribution of these key cellular components. This study focusing on imaging methods indicates that SBFI is extremely useful for analysis of large mm-scale samples of decalcified coral tissues. Complimentary FM and TPLSM reveal subtle submillimeter scale changes in cellular distribution and density in nondecalcified coral tissue samples. The TPLSM technique affords: (1) minimally invasive sample preparation, (2) superior optical sectioning ability, and (3) minimal light absorption and scattering, while still permitting deep tissue imaging.
Environmental Sciences, Issue 91, Serial block face imaging, two-photon fluorescence microscopy, Montastraea annularis, Montastraea faveolata, 3D coral tissue morphology and structure, zooxanthellae, chromatophore, autofluorescence, light harvesting optimization, environmental change
Lesion Explorer: A Video-guided, Standardized Protocol for Accurate and Reliable MRI-derived Volumetrics in Alzheimer's Disease and Normal Elderly
Institutions: Sunnybrook Health Sciences Centre, University of Toronto.
Obtaining in vivo
human brain tissue volumetrics from MRI is often complicated by various technical and biological issues. These challenges are exacerbated when significant brain atrophy and age-related white matter changes (e.g.
Leukoaraiosis) are present. Lesion Explorer (LE) is an accurate and reliable neuroimaging pipeline specifically developed to address such issues commonly observed on MRI of Alzheimer's disease and normal elderly. The pipeline is a complex set of semi-automatic procedures which has been previously validated in a series of internal and external reliability tests1,2
. However, LE's accuracy and reliability is highly dependent on properly trained manual operators to execute commands, identify distinct anatomical landmarks, and manually edit/verify various computer-generated segmentation outputs.
LE can be divided into 3 main components, each requiring a set of commands and manual operations: 1) Brain-Sizer, 2) SABRE, and 3) Lesion-Seg. Brain-Sizer's manual operations involve editing of the automatic skull-stripped total intracranial vault (TIV) extraction mask, designation of ventricular cerebrospinal fluid (vCSF), and removal of subtentorial structures. The SABRE component requires checking of image alignment along the anterior and posterior commissure (ACPC) plane, and identification of several anatomical landmarks required for regional parcellation. Finally, the Lesion-Seg component involves manual checking of the automatic lesion segmentation of subcortical hyperintensities (SH) for false positive errors.
While on-site training of the LE pipeline is preferable, readily available visual teaching tools with interactive training images are a viable alternative. Developed to ensure a high degree of accuracy and reliability, the following is a step-by-step, video-guided, standardized protocol for LE's manual procedures.
Medicine, Issue 86, Brain, Vascular Diseases, Magnetic Resonance Imaging (MRI), Neuroimaging, Alzheimer Disease, Aging, Neuroanatomy, brain extraction, ventricles, white matter hyperintensities, cerebrovascular disease, Alzheimer disease
Automated, Quantitative Cognitive/Behavioral Screening of Mice: For Genetics, Pharmacology, Animal Cognition and Undergraduate Instruction
Institutions: Rutgers University, Koç University, New York University, Fairfield University.
We describe a high-throughput, high-volume, fully automated, live-in 24/7 behavioral testing system for assessing the effects of genetic and pharmacological manipulations on basic mechanisms of cognition and learning in mice. A standard polypropylene mouse housing tub is connected through an acrylic tube to a standard commercial mouse test box. The test box has 3 hoppers, 2 of which are connected to pellet feeders. All are internally illuminable with an LED and monitored for head entries by infrared (IR) beams. Mice live in the environment, which eliminates handling during screening. They obtain their food during two or more daily feeding periods by performing in operant (instrumental) and Pavlovian (classical) protocols, for which we have written protocol-control software and quasi-real-time data analysis and graphing software. The data analysis and graphing routines are written in a MATLAB-based language created to simplify greatly the analysis of large time-stamped behavioral and physiological event records and to preserve a full data trail from raw data through all intermediate analyses to the published graphs and statistics within a single data structure. The data-analysis code harvests the data several times a day and subjects it to statistical and graphical analyses, which are automatically stored in the "cloud" and on in-lab computers. Thus, the progress of individual mice is visualized and quantified daily. The data-analysis code talks to the protocol-control code, permitting the automated advance from protocol to protocol of individual subjects. The behavioral protocols implemented are matching, autoshaping, timed hopper-switching, risk assessment in timed hopper-switching, impulsivity measurement, and the circadian anticipation of food availability. Open-source protocol-control and data-analysis code makes the addition of new protocols simple. Eight test environments fit in a 48 in x 24 in x 78 in cabinet; two such cabinets (16 environments) may be controlled by one computer.
Behavior, Issue 84, genetics, cognitive mechanisms, behavioral screening, learning, memory, timing
Preparation of Primary Myogenic Precursor Cell/Myoblast Cultures from Basal Vertebrate Lineages
Institutions: University of Alabama at Birmingham, INRA UR1067, INRA UR1037.
Due to the inherent difficulty and time involved with studying the myogenic program in vivo
, primary culture systems derived from the resident adult stem cells of skeletal muscle, the myogenic precursor cells (MPCs), have proven indispensible to our understanding of mammalian skeletal muscle development and growth. Particularly among the basal taxa of Vertebrata,
however, data are limited describing the molecular mechanisms controlling the self-renewal, proliferation, and differentiation of MPCs. Of particular interest are potential mechanisms that underlie the ability of basal vertebrates to undergo considerable postlarval skeletal myofiber hyperplasia (i.e.
teleost fish) and full regeneration following appendage loss (i.e.
urodele amphibians). Additionally, the use of cultured myoblasts could aid in the understanding of regeneration and the recapitulation of the myogenic program and the differences between them. To this end, we describe in detail a robust and efficient protocol (and variations therein) for isolating and maintaining MPCs and their progeny, myoblasts and immature myotubes, in cell culture as a platform for understanding the evolution of the myogenic program, beginning with the more basal vertebrates. Capitalizing on the model organism status of the zebrafish (Danio rerio
), we report on the application of this protocol to small fishes of the cyprinid clade Danioninae
. In tandem, this protocol can be utilized to realize a broader comparative approach by isolating MPCs from the Mexican axolotl (Ambystomamexicanum
) and even laboratory rodents. This protocol is now widely used in studying myogenesis in several fish species, including rainbow trout, salmon, and sea bream1-4
Basic Protocol, Issue 86, myogenesis, zebrafish, myoblast, cell culture, giant danio, moustached danio, myotubes, proliferation, differentiation, Danioninae, axolotl
Super-resolution Imaging of the Cytokinetic Z Ring in Live Bacteria Using Fast 3D-Structured Illumination Microscopy (f3D-SIM)
Institutions: University of Technology, Sydney.
Imaging of biological samples using fluorescence microscopy has advanced substantially with new technologies to overcome the resolution barrier of the diffraction of light allowing super-resolution of live samples. There are currently three main types of super-resolution techniques – stimulated emission depletion (STED), single-molecule localization microscopy (including techniques such as PALM, STORM, and GDSIM), and structured illumination microscopy (SIM). While STED and single-molecule localization techniques show the largest increases in resolution, they have been slower to offer increased speeds of image acquisition. Three-dimensional SIM (3D-SIM) is a wide-field fluorescence microscopy technique that offers a number of advantages over both single-molecule localization and STED. Resolution is improved, with typical lateral and axial resolutions of 110 and 280 nm, respectively and depth of sampling of up to 30 µm from the coverslip, allowing for imaging of whole cells. Recent advancements (fast 3D-SIM) in the technology increasing the capture rate of raw images allows for fast capture of biological processes occurring in seconds, while significantly reducing photo-toxicity and photobleaching. Here we describe the use of one such method to image bacterial cells harboring the fluorescently-labelled cytokinetic FtsZ protein to show how cells are analyzed and the type of unique information that this technique can provide.
Molecular Biology, Issue 91, super-resolution microscopy, fluorescence microscopy, OMX, 3D-SIM, Blaze, cell division, bacteria, Bacillus subtilis, Staphylococcus aureus, FtsZ, Z ring constriction
Measuring the Osmotic Water Permeability Coefficient (Pf) of Spherical Cells: Isolated Plant Protoplasts as an Example
Institutions: The Hebrew University of Jerusalem, Université catholique de Louvain, Université catholique de Louvain.
Studying AQP regulation mechanisms is crucial for the understanding of water relations at both the cellular and the whole plant levels. Presented here is a simple and very efficient method for the determination of the osmotic water permeability coefficient (Pf
) in plant protoplasts, applicable in principle also to other spherical cells such as frog oocytes. The first step of the assay is the isolation of protoplasts from the plant tissue of interest by enzymatic digestion into a chamber with an appropriate isotonic solution. The second step consists of an osmotic challenge assay: protoplasts immobilized on the bottom of the chamber are submitted to a constant perfusion starting with an isotonic solution and followed by a hypotonic solution. The cell swelling is video recorded. In the third step, the images are processed offline to yield volume changes, and the time course of the volume changes is correlated with the time course of the change in osmolarity of the chamber perfusion medium, using a curve fitting procedure written in Matlab (the ‘PfFit’), to yield Pf
Plant Biology, Issue 92, Osmotic water permeability coefficient, aquaporins, protoplasts, curve fitting, non-instantaneous osmolarity change, volume change time course
From Voxels to Knowledge: A Practical Guide to the Segmentation of Complex Electron Microscopy 3D-Data
Institutions: Lawrence Berkeley National Laboratory, Lawrence Berkeley National Laboratory, Lawrence Berkeley National Laboratory.
Modern 3D electron microscopy approaches have recently allowed unprecedented insight into the 3D ultrastructural organization of cells and tissues, enabling the visualization of large macromolecular machines, such as adhesion complexes, as well as higher-order structures, such as the cytoskeleton and cellular organelles in their respective cell and tissue context. Given the inherent complexity of cellular volumes, it is essential to first extract the features of interest in order to allow visualization, quantification, and therefore comprehension of their 3D organization. Each data set is defined by distinct characteristics, e.g.
, signal-to-noise ratio, crispness (sharpness) of the data, heterogeneity of its features, crowdedness of features, presence or absence of characteristic shapes that allow for easy identification, and the percentage of the entire volume that a specific region of interest occupies. All these characteristics need to be considered when deciding on which approach to take for segmentation.
The six different 3D ultrastructural data sets presented were obtained by three different imaging approaches: resin embedded stained electron tomography, focused ion beam- and serial block face- scanning electron microscopy (FIB-SEM, SBF-SEM) of mildly stained and heavily stained samples, respectively. For these data sets, four different segmentation approaches have been applied: (1) fully manual model building followed solely by visualization of the model, (2) manual tracing segmentation of the data followed by surface rendering, (3) semi-automated approaches followed by surface rendering, or (4) automated custom-designed segmentation algorithms followed by surface rendering and quantitative analysis. Depending on the combination of data set characteristics, it was found that typically one of these four categorical approaches outperforms the others, but depending on the exact sequence of criteria, more than one approach may be successful. Based on these data, we propose a triage scheme that categorizes both objective data set characteristics and subjective personal criteria for the analysis of the different data sets.
Bioengineering, Issue 90, 3D electron microscopy, feature extraction, segmentation, image analysis, reconstruction, manual tracing, thresholding
Cortical Source Analysis of High-Density EEG Recordings in Children
Institutions: UCL Institute of Child Health, University College London.
EEG is traditionally described as a neuroimaging technique with high temporal and low spatial resolution. Recent advances in biophysical modelling and signal processing make it possible to exploit information from other imaging modalities like structural MRI that provide high spatial resolution to overcome this constraint1
. This is especially useful for investigations that require high resolution in the temporal as well as spatial domain. In addition, due to the easy application and low cost of EEG recordings, EEG is often the method of choice when working with populations, such as young children, that do not tolerate functional MRI scans well. However, in order to investigate which neural substrates are involved, anatomical information from structural MRI is still needed. Most EEG analysis packages work with standard head models that are based on adult anatomy. The accuracy of these models when used for children is limited2
, because the composition and spatial configuration of head tissues changes dramatically over development3
In the present paper, we provide an overview of our recent work in utilizing head models based on individual structural MRI scans or age specific head models to reconstruct the cortical generators of high density EEG. This article describes how EEG recordings are acquired, processed, and analyzed with pediatric populations at the London Baby Lab, including laboratory setup, task design, EEG preprocessing, MRI processing, and EEG channel level and source analysis.
Behavior, Issue 88, EEG, electroencephalogram, development, source analysis, pediatric, minimum-norm estimation, cognitive neuroscience, event-related potentials
Analysis of Tubular Membrane Networks in Cardiac Myocytes from Atria and Ventricles
Institutions: Heart Research Center Goettingen, University Medical Center Goettingen, German Center for Cardiovascular Research (DZHK) partner site Goettingen, University of Maryland School of Medicine.
In cardiac myocytes a complex network of membrane tubules - the transverse-axial tubule system (TATS) - controls deep intracellular signaling functions. While the outer surface membrane and associated TATS membrane components appear to be continuous, there are substantial differences in lipid and protein content. In ventricular myocytes (VMs), certain TATS components are highly abundant contributing to rectilinear tubule networks and regular branching 3D architectures. It is thought that peripheral TATS components propagate action potentials from the cell surface to thousands of remote intracellular sarcoendoplasmic reticulum (SER) membrane contact domains, thereby activating intracellular Ca2+
release units (CRUs). In contrast to VMs, the organization and functional role of TATS membranes in atrial myocytes (AMs) is significantly different and much less understood. Taken together, quantitative structural characterization of TATS membrane networks in healthy and diseased myocytes is an essential prerequisite towards better understanding of functional plasticity and pathophysiological reorganization. Here, we present a strategic combination of protocols for direct quantitative analysis of TATS membrane networks in living VMs and AMs. For this, we accompany primary cell isolations of mouse VMs and/or AMs with critical quality control steps and direct membrane staining protocols for fluorescence imaging of TATS membranes. Using an optimized workflow for confocal or superresolution TATS image processing, binarized and skeletonized data are generated for quantitative analysis of the TATS network and its components. Unlike previously published indirect regional aggregate image analysis strategies, our protocols enable direct characterization of specific components and derive complex physiological properties of TATS membrane networks in living myocytes with high throughput and open access software tools. In summary, the combined protocol strategy can be readily applied for quantitative TATS network studies during physiological myocyte adaptation or disease changes, comparison of different cardiac or skeletal muscle cell types, phenotyping of transgenic models, and pharmacological or therapeutic interventions.
Bioengineering, Issue 92, cardiac myocyte, atria, ventricle, heart, primary cell isolation, fluorescence microscopy, membrane tubule, transverse-axial tubule system, image analysis, image processing, T-tubule, collagenase
Simultaneous Multicolor Imaging of Biological Structures with Fluorescence Photoactivation Localization Microscopy
Institutions: University of Maine.
Localization-based super resolution microscopy can be applied to obtain a spatial map (image) of the distribution of individual fluorescently labeled single molecules within a sample with a spatial resolution of tens of nanometers. Using either photoactivatable (PAFP) or photoswitchable (PSFP) fluorescent proteins fused to proteins of interest, or organic dyes conjugated to antibodies or other molecules of interest, fluorescence photoactivation localization microscopy (FPALM) can simultaneously image multiple species of molecules within single cells. By using the following approach, populations of large numbers (thousands to hundreds of thousands) of individual molecules are imaged in single cells and localized with a precision of ~10-30 nm. Data obtained can be applied to understanding the nanoscale spatial distributions of multiple protein types within a cell. One primary advantage of this technique is the dramatic increase in spatial resolution: while diffraction limits resolution to ~200-250 nm in conventional light microscopy, FPALM can image length scales more than an order of magnitude smaller. As many biological hypotheses concern the spatial relationships among different biomolecules, the improved resolution of FPALM can provide insight into questions of cellular organization which have previously been inaccessible to conventional fluorescence microscopy. In addition to detailing the methods for sample preparation and data acquisition, we here describe the optical setup for FPALM. One additional consideration for researchers wishing to do super-resolution microscopy is cost: in-house setups are significantly cheaper than most commercially available imaging machines. Limitations of this technique include the need for optimizing the labeling of molecules of interest within cell samples, and the need for post-processing software to visualize results. We here describe the use of PAFP and PSFP expression to image two protein species in fixed cells. Extension of the technique to living cells is also described.
Basic Protocol, Issue 82, Microscopy, Super-resolution imaging, Multicolor, single molecule, FPALM, Localization microscopy, fluorescent proteins
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
Institutions: Princeton University.
The aim of de novo
protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo
protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity.
To disseminate these methods for broader use we present Protein WISDOM (https://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.
Genetics, Issue 77, Molecular Biology, Bioengineering, Biochemistry, Biomedical Engineering, Chemical Engineering, Computational Biology, Genomics, Proteomics, Protein, Protein Binding, Computational Biology, Drug Design, optimization (mathematics), Amino Acids, Peptides, and Proteins, De novo protein and peptide design, Drug design, In silico sequence selection, Optimization, Fold specificity, Binding affinity, sequencing
Single Particle Electron Microscopy Reconstruction of the Exosome Complex Using the Random Conical Tilt Method
Institutions: Yale University.
Single particle electron microscopy (EM) reconstruction has recently become a popular tool to get the three-dimensional (3D) structure of large macromolecular complexes. Compared to X-ray crystallography, it has some unique advantages. First, single particle EM reconstruction does not need to crystallize the protein sample, which is the bottleneck in X-ray crystallography, especially for large macromolecular complexes. Secondly, it does not need large amounts of protein samples. Compared with milligrams of proteins necessary for crystallization, single particle EM reconstruction only needs several micro-liters of protein solution at nano-molar concentrations, using the negative staining EM method. However, despite a few macromolecular assemblies with high symmetry, single particle EM is limited at relatively low resolution (lower than 1 nm resolution) for many specimens especially those without symmetry. This technique is also limited by the size of the molecules under study, i.e. 100 kDa for negatively stained specimens and 300 kDa for frozen-hydrated specimens in general.
For a new sample of unknown structure, we generally use a heavy metal solution to embed the molecules by negative staining. The specimen is then examined in a transmission electron microscope to take two-dimensional (2D) micrographs of the molecules. Ideally, the protein molecules have a homogeneous 3D structure but exhibit different orientations in the micrographs. These micrographs are digitized and processed in computers as "single particles". Using two-dimensional alignment and classification techniques, homogenous molecules in the same views are clustered into classes. Their averages enhance the signal of the molecule's 2D shapes. After we assign the particles with the proper relative orientation (Euler angles), we will be able to reconstruct the 2D particle images into a 3D virtual volume.
In single particle 3D reconstruction, an essential step is to correctly assign the proper orientation of each single particle. There are several methods to assign the view for each particle, including the angular reconstitution1
and random conical tilt (RCT) method2
. In this protocol, we describe our practice in getting the 3D reconstruction of yeast exosome complex using negative staining EM and RCT. It should be noted that our protocol of electron microscopy and image processing follows the basic principle of RCT but is not the only way to perform the method. We first describe how to embed the protein sample into a layer of Uranyl-Formate with a thickness comparable to the protein size, using a holey carbon grid covered with a layer of continuous thin carbon film. Then the specimen is inserted into a transmission electron microscope to collect untilted (0-degree) and tilted (55-degree) pairs of micrographs that will be used later for processing and obtaining an initial 3D model of the yeast exosome. To this end, we perform RCT and then refine the initial 3D model by using the projection matching refinement method3
Structural Biology, Issue 49, Electron microscopy, single particle three-dimensional reconstruction, exosome complex, negative staining
Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
Institutions: University of Toledo Health Science Campus.
Non-coding genomic regions in complex eukaryotes, including intergenic areas, introns, and untranslated segments of exons, are profoundly non-random in their nucleotide composition and consist of a complex mosaic of sequence patterns. These patterns include so-called Mid-Range Inhomogeneity (MRI) regions -- sequences 30-10000 nucleotides in length that are enriched by a particular base or combination of bases (e.g. (G+T)-rich, purine-rich, etc.). MRI regions are associated with unusual (non-B-form) DNA structures that are often involved in regulation of gene expression, recombination, and other genetic processes (Fedorova & Fedorov 2010). The existence of a strong fixation bias within MRI regions against mutations that tend to reduce their sequence inhomogeneity additionally supports the functionality and importance of these genomic sequences (Prakash et al.
Here we demonstrate a freely available Internet resource -- the Genomic MRI
program package -- designed for computational analysis of genomic sequences in order to find and characterize various MRI patterns within them (Bechtel et al.
2008). This package also allows generation of randomized sequences with various properties and level of correspondence to the natural input DNA sequences. The main goal of this resource is to facilitate examination of vast regions of non-coding DNA that are still scarcely investigated and await thorough exploration and recognition.
Genetics, Issue 51, bioinformatics, computational biology, genomics, non-randomness, signals, gene regulation, DNA conformation
Annotation of Plant Gene Function via Combined Genomics, Metabolomics and Informatics
Given the ever expanding number of model plant species for which complete genome sequences are available and the abundance of bio-resources such as knockout mutants, wild accessions and advanced breeding populations, there is a rising burden for gene functional annotation. In this protocol, annotation of plant gene function using combined co-expression gene analysis, metabolomics and informatics is provided (Figure 1
). This approach is based on the theory of using target genes of known function to allow the identification of non-annotated genes likely to be involved in a certain metabolic process, with the identification of target compounds via metabolomics. Strategies are put forward for applying this information on populations generated by both forward and reverse genetics approaches in spite of none of these are effortless. By corollary this approach can also be used as an approach to characterise unknown peaks representing new or specific secondary metabolites in the limited tissues, plant species or stress treatment, which is currently the important trial to understanding plant metabolism.
Plant Biology, Issue 64, Genetics, Bioinformatics, Metabolomics, Plant metabolism, Transcriptome analysis, Functional annotation, Computational biology, Plant biology, Theoretical biology, Spectroscopy and structural analysis
Amide Hydrogen/Deuterium Exchange & MALDI-TOF Mass Spectrometry Analysis of Pak2 Activation
Institutions: Tunghai University, University of California, Riverside .
Amide hydrogen/deuterium exchange (H/D exchange) coupled with mass spectrometry has been widely used to analyze the interface of protein-protein interactions, protein conformational changes, protein dynamics and protein-ligand interactions. H/D exchange on the backbone amide positions has been utilized to measure the deuteration rates of the micro-regions in a protein by mass spectrometry1,2,3
. The resolution of this method depends on pepsin digestion of the deuterated protein of interest into peptides that normally range from 3-20 residues. Although the resolution of H/D exchange measured by mass spectrometry is lower than the single residue resolution measured by the Heteronuclear Single Quantum Coherence (HSQC) method of NMR, the mass spectrometry measurement in H/D exchange is not restricted by the size of the protein4
. H/D exchange is carried out in an aqueous solution which maintains protein conformation. We provide a method that utilizes the MALDI-TOF for detection2
, instead of a HPLC/ESI (electrospray ionization)-MS system5,6
. The MALDI-TOF provides accurate mass intensity data for the peptides of the digested protein, in this case protein kinase Pak2 (also called γ-Pak). Proteolysis of Pak 2 is carried out in an offline pepsin digestion. This alternative method, when the user does not have access to a HPLC and pepsin column connected to mass spectrometry, or when the pepsin column on HPLC does not result in an optimal digestion map, for example, the heavily disulfide-bonded secreted Phospholipase A2
). Utilizing this method, we successfully monitored changes in the deuteration level during activation of Pak2 by caspase 3 cleavage and autophosphorylation7,8,9
Biochemistry, Issue 57, Deuterium, H/D exchange, Mass Spectrometry, Pak2, Caspase 3, MALDI-TOF
The ITS2 Database
Institutions: University of Würzburg, University of Würzburg.
The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution1
and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation2-8
The ITS2 Database9
presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank11
. Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold12
(direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling13
. In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold.
The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST14
search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE15,16
for multiple sequence-structure alignment calculation and Neighbor Joining18
tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure.
In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses.
Genetics, Issue 61, alignment, internal transcribed spacer 2, molecular systematics, secondary structure, ribosomal RNA, phylogenetic tree, homology modeling, phylogeny
Mapping Bacterial Functional Networks and Pathways in Escherichia Coli using Synthetic Genetic Arrays
Institutions: University of Toronto, University of Toronto, University of Regina.
Phenotypes are determined by a complex series of physical (e.g.
protein-protein) and functional (e.g.
gene-gene or genetic) interactions (GI)1
. While physical interactions can indicate which bacterial proteins are associated as complexes, they do not necessarily reveal pathway-level functional relationships1. GI screens, in which the growth of double mutants bearing two deleted or inactivated genes is measured and compared to the corresponding single mutants, can illuminate epistatic dependencies between loci and hence provide a means to query and discover novel functional relationships2
. Large-scale GI maps have been reported for eukaryotic organisms like yeast3-7
, but GI information remains sparse for prokaryotes8
, which hinders the functional annotation of bacterial genomes. To this end, we and others have developed high-throughput quantitative bacterial GI screening methods9, 10
Here, we present the key steps required to perform quantitative E. coli
Synthetic Genetic Array (eSGA) screening procedure on a genome-scale9
, using natural bacterial conjugation and homologous recombination to systemically generate and measure the fitness of large numbers of double mutants in a colony array format.
Briefly, a robot is used to transfer, through conjugation, chloramphenicol (Cm) - marked mutant alleles from engineered Hfr (High frequency of recombination) 'donor strains' into an ordered array of kanamycin (Kan) - marked F- recipient strains. Typically, we use loss-of-function single mutants bearing non-essential gene deletions (e.g.
the 'Keio' collection11
) and essential gene hypomorphic mutations (i.e.
alleles conferring reduced protein expression, stability, or activity9, 12, 13
) to query the functional associations of non-essential and essential genes, respectively. After conjugation and ensuing genetic exchange mediated by homologous recombination, the resulting double mutants are selected on solid medium containing both antibiotics. After outgrowth, the plates are digitally imaged and colony sizes are quantitatively scored using an in-house automated image processing system14
. GIs are revealed when the growth rate of a double mutant is either significantly better or worse than expected9
. Aggravating (or negative) GIs often result between loss-of-function mutations in pairs of genes from compensatory pathways that impinge on the same essential process2
. Here, the loss of a single gene is buffered, such that either single mutant is viable. However, the loss of both pathways is deleterious and results in synthetic lethality or sickness (i.e.
slow growth). Conversely, alleviating (or positive) interactions can occur between genes in the same pathway or protein complex2
as the deletion of either gene alone is often sufficient to perturb the normal function of the pathway or complex such that additional perturbations do not reduce activity, and hence growth, further. Overall, systematically identifying and analyzing GI networks can provide unbiased, global maps of the functional relationships between large numbers of genes, from which pathway-level information missed by other approaches can be inferred9
Genetics, Issue 69, Molecular Biology, Medicine, Biochemistry, Microbiology, Aggravating, alleviating, conjugation, double mutant, Escherichia coli, genetic interaction, Gram-negative bacteria, homologous recombination, network, synthetic lethality or sickness, suppression
Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes
Institutions: Dartmouth College.
SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference1
. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data1
. In this article, we utilize a web version of SCOPE2
to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs3,4
and has been used in other studies5-8
The three algorithms that comprise SCOPE are BEAM9
, which finds non-degenerate motifs (ACCGGT), PRISM10
, which finds degenerate motifs (ASCGWT), and SPACER11
, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well.
Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor.
Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run.
Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and "strand" indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail1,2,9-11
Genetics, Issue 51, gene regulation, computational biology, algorithm, promoter sequence motif
Analyzing and Building Nucleic Acid Structures with 3DNA
Institutions: Rutgers - The State University of New Jersey, Columbia University .
The 3DNA software package is a popular and versatile bioinformatics tool with capabilities to analyze, construct, and visualize three-dimensional nucleic acid structures. This article presents detailed protocols for a subset of new and popular features available in 3DNA, applicable to both individual structures and ensembles of related structures. Protocol 1 lists the set of instructions needed to download and install the software. This is followed, in Protocol 2, by the analysis of a nucleic acid structure, including the assignment of base pairs and the determination of rigid-body parameters that describe the structure and, in Protocol 3, by a description of the reconstruction of an atomic model of a structure from its rigid-body parameters. The most recent version of 3DNA, version 2.1, has new features for the analysis and manipulation of ensembles of structures, such as those deduced from nuclear magnetic resonance (NMR) measurements and molecular dynamic (MD) simulations; these features are presented in Protocols 4 and 5. In addition to the 3DNA stand-alone software package, the w3DNA web server, located at https://w3dna.rutgers.edu, provides a user-friendly interface to selected features of the software. Protocol 6 demonstrates a novel feature of the site for building models of long DNA molecules decorated with bound proteins at user-specified locations.
Genetics, Issue 74, Molecular Biology, Biochemistry, Bioengineering, Biophysics, Genomics, Chemical Biology, Quantitative Biology, conformational analysis, DNA, high-resolution structures, model building, molecular dynamics, nucleic acid structure, RNA, visualization, bioinformatics, three-dimensional, 3DNA, software
Facilitating the Analysis of Immunological Data with Visual Analytic Techniques
Institutions: University of British Columbia, University of British Columbia, University of British Columbia.
Visual analytics (VA) has emerged as a new way to analyze large dataset through interactive visual display. We demonstrated the utility and the flexibility of a VA approach in the analysis of biological datasets. Examples of these datasets in immunology include flow cytometry, Luminex data, and genotyping (e.g., single nucleotide polymorphism) data. Contrary to the traditional information visualization approach, VA restores the analysis power in the hands of analyst by allowing the analyst to engage in real-time data exploration process. We selected the VA software called Tableau after evaluating several VA tools. Two types of analysis tasks analysis within and between datasets were demonstrated in the video presentation using an approach called paired analysis. Paired analysis, as defined in VA, is an analysis approach in which a VA tool expert works side-by-side with a domain expert during the analysis. The domain expert is the one who understands the significance of the data, and asks the questions that the collected data might address. The tool expert then creates visualizations to help find patterns in the data that might answer these questions. The short lag-time between the hypothesis generation and the rapid visual display of the data is the main advantage of a VA approach.
Immunology, Issue 47, Visual analytics, flow cytometry, Luminex, Tableau, cytokine, innate immunity, single nucleotide polymorphism
Use of Arabidopsis eceriferum Mutants to Explore Plant Cuticle Biosynthesis
Institutions: University of British Columbia - UBC, University of British Columbia - UBC.
The plant cuticle is a waxy outer covering on plants that has a primary role in water conservation, but is also an important barrier against the entry of pathogenic microorganisms. The cuticle is made up of a tough crosslinked polymer called "cutin" and a protective wax layer that seals the plant surface. The waxy layer of the cuticle is obvious on many plants, appearing as a shiny film on the ivy leaf or as a dusty outer covering on the surface of a grape or a cabbage leaf thanks to light scattering crystals present in the wax. Because the cuticle is an essential adaptation of plants to a terrestrial environment, understanding the genes involved in plant cuticle formation has applications in both agriculture and forestry. Today, we'll show the analysis of plant cuticle mutants identified by forward and reverse genetics approaches.
Plant Biology, Issue 16, Annual Review, Cuticle, Arabidopsis, Eceriferum Mutants, Cryso-SEM, Gas Chromatography