Many researchers, across incredibly diverse foci, are applying phylogenetics to their research question(s). However, many researchers are new to this topic and so it presents inherent problems. Here we compile a practical introduction to phylogenetics for nonexperts. We outline in a step-by-step manner, a pipeline for generating reliable phylogenies from gene sequence datasets. We begin with a user-guide for similarity search tools via online interfaces as well as local executables. Next, we explore programs for generating multiple sequence alignments followed by protocols for using software to determine best-fit models of evolution. We then outline protocols for reconstructing phylogenetic relationships via maximum likelihood and Bayesian criteria and finally describe tools for visualizing phylogenetic trees. While this is not by any means an exhaustive description of phylogenetic approaches, it does provide the reader with practical starting information on key software applications commonly utilized by phylogeneticists. The vision for this article would be that it could serve as a practical training tool for researchers embarking on phylogenetic studies and also serve as an educational resource that could be incorporated into a classroom or teaching-lab.
25 Related JoVE Articles!
Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues
Institutions: The Research Institute at Nationwide Children's Hospital.
Protein alignments are commonly used to evaluate the similarity of protein residues, and the derived consensus sequence used for identifying functional units (e.g.,
domains). Traditional consensus-building models fail to account for interpositional dependencies – functionally required covariation of residues that tend to appear simultaneously throughout evolution and across the phylogentic tree. These relationships can reveal important clues about the processes of protein folding, thermostability, and the formation of functional sites, which in turn can be used to inform the engineering of synthetic proteins. Unfortunately, these relationships essentially form sub-motifs which cannot be predicted by simple “majority rule” or even HMM-based consensus models, and the result can be a biologically invalid “consensus” which is not only never seen in nature but is less viable than any extant protein. We have developed a visual analytics tool, StickWRLD, which creates an interactive 3D representation of a protein alignment and clearly displays covarying residues. The user has the ability to pan and zoom, as well as dynamically change the statistical threshold underlying the identification of covariants. StickWRLD has previously been successfully used to identify functionally-required covarying residues in proteins such as Adenylate Kinase and in DNA sequences such as endonuclease target sites.
Chemistry, Issue 101, protein engineering, covariation, codependent residues, visualization
Predicting the Effectiveness of Population Replacement Strategy Using Mathematical Modeling
Institutions: University of California, Los Angeles.
Charles Taylor and John Marshall explain the utility of mathematical modeling for evaluating the effectiveness of population replacement strategy. Insight is given into how computational models can provide information on the population dynamics of mosquitoes and the spread of transposable elements through A. gambiae subspecies. The ethical considerations of releasing genetically modified mosquitoes into the wild are discussed.
Cellular Biology, Issue 5, mosquito, malaria, popuulation, replacement, modeling, infectious disease
Rapid Analysis and Exploration of Fluorescence Microscopy Images
Institutions: UT Southwestern Medical Center, UT Southwestern Medical Center, Princeton University.
Despite rapid advances in high-throughput microscopy, quantitative image-based assays still pose significant challenges. While a variety of specialized image analysis tools are available, most traditional image-analysis-based workflows have steep learning curves (for fine tuning of analysis parameters) and result in long turnaround times between imaging and analysis. In particular, cell segmentation, the process of identifying individual cells in an image, is a major bottleneck in this regard.
Here we present an alternate, cell-segmentation-free workflow based on PhenoRipper, an open-source software platform designed for the rapid analysis and exploration of microscopy images. The pipeline presented here is optimized for immunofluorescence microscopy images of cell cultures and requires minimal user intervention. Within half an hour, PhenoRipper can analyze data from a typical 96-well experiment and generate image profiles. Users can then visually explore their data, perform quality control on their experiment, ensure response to perturbations and check reproducibility of replicates. This facilitates a rapid feedback cycle between analysis and experiment, which is crucial during assay optimization. This protocol is useful not just as a first pass analysis for quality control, but also may be used as an end-to-end solution, especially for screening. The workflow described here scales to large data sets such as those generated by high-throughput screens, and has been shown to group experimental conditions by phenotype accurately over a wide range of biological systems. The PhenoBrowser interface provides an intuitive framework to explore the phenotypic space and relate image properties to biological annotations. Taken together, the protocol described here will lower the barriers to adopting quantitative analysis of image based screens.
Basic Protocol, Issue 85, PhenoRipper, fluorescence microscopy, image analysis, High-content analysis, high-throughput screening, Open-source, Phenotype
Use of MALDI-TOF Mass Spectrometry and a Custom Database to Characterize Bacteria Indigenous to a Unique Cave Environment (Kartchner Caverns, AZ, USA)
Institutions: Arizona State University, Applied Maths NV.
MALDI-TOF mass spectrometry has been shown to be a rapid and reliable tool for identification of bacteria at the genus and species, and in some cases, strain levels. Commercially available and open source software tools have been developed to facilitate identification; however, no universal/standardized data analysis pipeline has been described in the literature. Here, we provide a comprehensive and detailed demonstration of bacterial identification procedures using a MALDI-TOF mass spectrometer. Mass spectra were collected from 15 diverse bacteria isolated from Kartchner Caverns, AZ, USA, and identified by 16S rDNA sequencing. Databases were constructed in BioNumerics 7.1. Follow-up analyses of mass spectra were performed, including cluster analyses, peak matching, and statistical analyses. Identification was performed using blind-coded samples randomly selected from these 15 bacteria. Two identification methods are presented: similarity coefficient-based and biomarker-based methods. Results show that both identification methods can identify the bacteria to the species level.
Environmental Sciences, Issue 95, Identification, environmental bacteria, MALDI-TOF mass spectrometry, BioNumerics, fingerprint, database, similarity coefficient, biomarker
Creating Objects and Object Categories for Studying Perception and Perceptual Learning
Institutions: Georgia Health Sciences University, Georgia Health Sciences University, Georgia Health Sciences University, Palo Alto Research Center, Palo Alto Research Center, University of Minnesota .
In order to quantitatively study object perception, be it perception by biological systems or by machines, one needs to create objects and object categories with precisely definable, preferably naturalistic, properties1
. Furthermore, for studies on perceptual learning, it is useful to create novel objects and object categories (or object classes
) with such properties2
Many innovative and useful methods currently exist for creating novel objects and object categories3-6
(also see refs. 7,8). However, generally speaking, the existing methods have three broad types of shortcomings.
First, shape variations are generally imposed by the experimenter5,9,10
, and may therefore be different from the variability in natural categories, and optimized for a particular recognition algorithm. It would be desirable to have the variations arise independently of the externally imposed constraints.
Second, the existing methods have difficulty capturing the shape complexity of natural objects11-13
. If the goal is to study natural object perception, it is desirable for objects and object categories to be naturalistic, so as to avoid possible confounds and special cases.
Third, it is generally hard to quantitatively measure the available information in the stimuli created by conventional methods. It would be desirable to create objects and object categories where the available information can be precisely measured and, where necessary, systematically manipulated (or 'tuned'). This allows one to formulate the underlying object recognition tasks in quantitative terms.
Here we describe a set of algorithms, or methods, that meet all three of the above criteria. Virtual morphogenesis (VM) creates novel, naturalistic virtual 3-D objects called 'digital embryos' by simulating the biological process of embryogenesis14
. Virtual phylogenesis (VP) creates novel, naturalistic object categories by simulating the evolutionary process of natural selection9,12,13
. Objects and object categories created by these simulations can be further manipulated by various morphing methods to generate systematic variations of shape characteristics15,16
. The VP and morphing methods can also be applied, in principle, to novel virtual objects other than digital embryos, or to virtual versions of real-world objects9,13
. Virtual objects created in this fashion can be rendered as visual images using a conventional graphical toolkit, with desired manipulations of surface texture, illumination, size, viewpoint and background. The virtual objects can also be 'printed' as haptic objects using a conventional 3-D prototyper.
We also describe some implementations of these computational algorithms to help illustrate the potential utility of the algorithms. It is important to distinguish the algorithms from their implementations. The implementations are demonstrations offered solely as a 'proof of principle' of the underlying algorithms. It is important to note that, in general, an implementation of a computational algorithm often has limitations that the algorithm itself does not have.
Together, these methods represent a set of powerful and flexible tools for studying object recognition and perceptual learning by biological and computational systems alike. With appropriate extensions, these methods may also prove useful in the study of morphogenesis and phylogenesis.
Neuroscience, Issue 69, machine learning, brain, classification, category learning, cross-modal perception, 3-D prototyping, inference
Genome-wide Protein-protein Interaction Screening by Protein-fragment Complementation Assay (PCA) in Living Cells
Institutions: Université Laval.
Proteins are the building blocks, effectors and signal mediators of cellular processes. A protein’s function, regulation and localization often depend on its interactions with other proteins. Here, we describe a protocol for the yeast protein-fragment complementation assay (PCA), a powerful method to detect direct and proximal associations between proteins in living cells. The interaction between two proteins, each fused to a dihydrofolate reductase (DHFR) protein fragment, translates into growth of yeast strains in presence of the drug methotrexate (MTX). Differential fitness, resulting from different amounts of reconstituted DHFR enzyme, can be quantified on high-density colony arrays, allowing to differentiate interacting from non-interacting bait-prey pairs. The high-throughput protocol presented here is performed using a robotic platform that parallelizes mating of bait and prey strains carrying complementary DHFR-fragment fusion proteins and the survival assay on MTX. This protocol allows to systematically test for thousands of protein-protein interactions (PPIs) involving bait proteins of interest and offers several advantages over other PPI detection assays, including the study of proteins expressed from their endogenous promoters without the need for modifying protein localization and for the assembly of complex reporter constructs.
Cellular Biology, Issue 97, Protein-protein interaction (PPI); high-throughput screening; yeast; protein-fragment complementation assay (PCA); dihydrofolate reductase (DHFR); high-density arrays; systems biology; biological networks
A Toolkit to Enable Hydrocarbon Conversion in Aqueous Environments
Institutions: Delft University of Technology, Delft University of Technology.
This work puts forward a toolkit that enables the conversion of alkanes by Escherichia coli
and presents a proof of principle of its applicability. The toolkit consists of multiple standard interchangeable parts (BioBricks)9
addressing the conversion of alkanes, regulation of gene expression and survival in toxic hydrocarbon-rich environments.
A three-step pathway for alkane degradation was implemented in E. coli
to enable the conversion of medium- and long-chain alkanes to their respective alkanols, alkanals and ultimately alkanoic-acids. The latter were metabolized via the native β-oxidation pathway. To facilitate the oxidation of medium-chain alkanes (C5-C13) and cycloalkanes (C5-C8), four genes (alkB2
) of the alkane hydroxylase system from Gordonia
were transformed into E. coli
. For the conversion of long-chain alkanes (C15-C36), theladA
gene from Geobacillus thermodenitrificans
was implemented. For the required further steps of the degradation process, ADH
and ALDH (
originating from G. thermodenitrificans
) were introduced10,11
. The activity was measured by resting cell assays. For each oxidative step, enzyme activity was observed.
To optimize the process efficiency, the expression was only induced under low glucose conditions: a substrate-regulated promoter, pCaiF, was used. pCaiF is present in E. coli
K12 and regulates the expression of the genes involved in the degradation of non-glucose carbon sources.
The last part of the toolkit - targeting survival - was implemented using solvent tolerance genes, PhPFDα and β, both from Pyrococcus horikoshii
OT3. Organic solvents can induce cell stress and decreased survivability by negatively affecting protein folding. As chaperones, PhPFDα and β improve the protein folding process e.g.
under the presence of alkanes. The expression of these genes led to an improved hydrocarbon tolerance shown by an increased growth rate (up to 50%) in the presences of 10% n
-hexane in the culture medium were observed.
Summarizing, the results indicate that the toolkit enables E. coli
to convert and tolerate hydrocarbons in aqueous environments. As such, it represents an initial step towards a sustainable solution for oil-remediation using a synthetic biology approach.
Bioengineering, Issue 68, Microbiology, Biochemistry, Chemistry, Chemical Engineering, Oil remediation, alkane metabolism, alkane hydroxylase system, resting cell assay, prefoldin, Escherichia coli, synthetic biology, homologous interaction mapping, mathematical model, BioBrick, iGEM
A High Throughput MHC II Binding Assay for Quantitative Analysis of Peptide Epitopes
Institutions: Dartmouth College, University of Rhode Island, Dartmouth College.
Biochemical assays with recombinant human MHC II molecules can provide rapid, quantitative insights into immunogenic epitope identification, deletion, or design1,2
. Here, a peptide-MHC II binding assay is scaled to 384-well format. The scaled down protocol reduces reagent costs by 75% and is higher throughput than previously described 96-well protocols1,3-5
. Specifically, the experimental design permits robust and reproducible analysis of up to 15 peptides against one MHC II allele per 384-well ELISA plate. Using a single liquid handling robot, this method allows one researcher to analyze approximately ninety test peptides in triplicate over a range of eight concentrations and four MHC II allele types in less than 48 hr. Others working in the fields of protein deimmunization or vaccine design and development may find the protocol to be useful in facilitating their own work. In particular, the step-by-step instructions and the visual format of JoVE should allow other users to quickly and easily establish this methodology in their own labs.
Biochemistry, Issue 85, Immunoassay, Protein Immunogenicity, MHC II, T cell epitope, High Throughput Screen, Deimmunization, Vaccine Design
Utilization of Microscale Silicon Cantilevers to Assess Cellular Contractile Function In Vitro
Institutions: University of Central Florida.
The development of more predictive and biologically relevant in vitro
assays is predicated on the advancement of versatile cell culture systems which facilitate the functional assessment of the seeded cells. To that end, microscale cantilever technology offers a platform with which to measure the contractile functionality of a range of cell types, including skeletal, cardiac, and smooth muscle cells, through assessment of contraction induced substrate bending. Application of multiplexed cantilever arrays provides the means to develop moderate to high-throughput protocols for assessing drug efficacy and toxicity, disease phenotype and progression, as well as neuromuscular and other cell-cell interactions. This manuscript provides the details for fabricating reliable cantilever arrays for this purpose, and the methods required to successfully culture cells on these surfaces. Further description is provided on the steps necessary to perform functional analysis of contractile cell types maintained on such arrays using a novel laser and photo-detector system. The representative data provided highlights the precision and reproducible nature of the analysis of contractile function possible using this system, as well as the wide range of studies to which such technology can be applied. Successful widespread adoption of this system could provide investigators with the means to perform rapid, low cost functional studies in vitro,
leading to more accurate predictions of tissue performance, disease development and response to novel therapeutic treatment.
Bioengineering, Issue 92, cantilever, in vitro, contraction, skeletal muscle, NMJ, cardiomyocytes, functional
Experimental Protocol for Manipulating Plant-induced Soil Heterogeneity
Institutions: Case Western Reserve University.
Coexistence theory has often treated environmental heterogeneity as being independent of the community composition; however biotic feedbacks such as plant-soil feedbacks (PSF) have large effects on plant performance, and create environmental heterogeneity that depends on the community composition. Understanding the importance of PSF for plant community assembly necessitates understanding of the role of heterogeneity in PSF, in addition to mean PSF effects. Here, we describe a protocol for manipulating plant-induced soil heterogeneity. Two example experiments are presented: (1) a field experiment with a 6-patch grid of soils to measure plant population responses and (2) a greenhouse experiment with 2-patch soils to measure individual plant responses. Soils can be collected from the zone of root influence (soils from the rhizosphere and directly adjacent to the rhizosphere) of plants in the field from conspecific and heterospecific plant species. Replicate collections are used to avoid pseudoreplicating soil samples. These soils are then placed into separate patches for heterogeneous treatments or mixed for a homogenized treatment. Care should be taken to ensure that heterogeneous and homogenized treatments experience the same degree of soil disturbance. Plants can then be placed in these soil treatments to determine the effect of plant-induced soil heterogeneity on plant performance. We demonstrate that plant-induced heterogeneity results in different outcomes than predicted by traditional coexistence models, perhaps because of the dynamic nature of these feedbacks. Theory that incorporates environmental heterogeneity influenced by the assembling community and additional empirical work is needed to determine when heterogeneity intrinsic to the assembling community will result in different assembly outcomes compared with heterogeneity extrinsic to the community composition.
Environmental Sciences, Issue 85, Coexistence, community assembly, environmental drivers, plant-soil feedback, soil heterogeneity, soil microbial communities, soil patch
Quantitative Detection of Trace Explosive Vapors by Programmed Temperature Desorption Gas Chromatography-Electron Capture Detector
Institutions: U.S. Naval Research Laboratory, NOVA Research, Inc., U.S. Naval Research Laboratory, U.S. Naval Research Laboratory.
The direct liquid deposition of solution standards onto sorbent-filled thermal desorption tubes is used for the quantitative analysis of trace explosive vapor samples. The direct liquid deposition method yields a higher fidelity between the analysis of vapor samples and the analysis of solution standards than using separate injection methods for vapors and solutions, i.e.
, samples collected on vapor collection tubes and standards prepared in solution vials. Additionally, the method can account for instrumentation losses, which makes it ideal for minimizing variability and quantitative trace chemical detection. Gas chromatography with an electron capture detector is an instrumentation configuration sensitive to nitro-energetics, such as TNT and RDX, due to their relatively high electron affinity. However, vapor quantitation of these compounds is difficult without viable vapor standards. Thus, we eliminate the requirement for vapor standards by combining the sensitivity of the instrumentation with a direct liquid deposition protocol to analyze trace explosive vapor samples.
Chemistry, Issue 89,
Gas Chromatography (GC), Electron Capture Detector, Explosives, Quantitation, Thermal Desorption, TNT, RDX
Sample Drift Correction Following 4D Confocal Time-lapse Imaging
Institutions: Monash University, Howard Hughes Medical Institute.
The generation of four-dimensional (4D) confocal datasets; consisting of 3D image sequences over time; provides an excellent methodology to capture cellular behaviors involved in developmental processes. The ability to track and follow cell movements is limited by sample movements that occur due to drift of the sample or, in some cases, growth during image acquisition. Tracking cells in datasets affected by drift and/or growth will incorporate these movements into any analysis of cell position. This may result in the apparent movement of static structures within the sample. Therefore prior to cell tracking, any sample drift should be corrected. Using the open source Fiji distribution 1
of ImageJ 2,3
and the incorporated LOCI tools 4
, we developed the Correct 3D drift plug-in to remove erroneous sample movement in confocal datasets. This protocol effectively compensates for sample translation or alterations in focal position by utilizing phase correlation to register each time-point of a four-dimensional confocal datasets while maintaining the ability to visualize and measure cell movements over extended time-lapse experiments.
Bioengineering, Issue 86, Image Processing, Computer-Assisted, Zebrafish, Microscopy, Confocal, Time-Lapse Imaging, imaging, zebrafish, Confocal, fiji, three-dimensional, four-dimensional, registration
Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
Institutions: University of Toledo Health Science Campus.
Non-coding genomic regions in complex eukaryotes, including intergenic areas, introns, and untranslated segments of exons, are profoundly non-random in their nucleotide composition and consist of a complex mosaic of sequence patterns. These patterns include so-called Mid-Range Inhomogeneity (MRI) regions -- sequences 30-10000 nucleotides in length that are enriched by a particular base or combination of bases (e.g. (G+T)-rich, purine-rich, etc.). MRI regions are associated with unusual (non-B-form) DNA structures that are often involved in regulation of gene expression, recombination, and other genetic processes (Fedorova & Fedorov 2010). The existence of a strong fixation bias within MRI regions against mutations that tend to reduce their sequence inhomogeneity additionally supports the functionality and importance of these genomic sequences (Prakash et al.
Here we demonstrate a freely available Internet resource -- the Genomic MRI
program package -- designed for computational analysis of genomic sequences in order to find and characterize various MRI patterns within them (Bechtel et al.
2008). This package also allows generation of randomized sequences with various properties and level of correspondence to the natural input DNA sequences. The main goal of this resource is to facilitate examination of vast regions of non-coding DNA that are still scarcely investigated and await thorough exploration and recognition.
Genetics, Issue 51, bioinformatics, computational biology, genomics, non-randomness, signals, gene regulation, DNA conformation
Novel 3D/VR Interactive Environment for MD Simulations, Visualization and Analysis
Institutions: University of California Merced.
The increasing development of computing (hardware and software) in the last decades has impacted scientific research in many fields including materials science, biology, chemistry and physics among many others. A new computational system for the accurate and fast simulation and 3D/VR visualization of nanostructures is presented here, using the open-source molecular dynamics (MD) computer program LAMMPS. This alternative computational method uses modern graphics processors, NVIDIA CUDA technology and specialized scientific codes to overcome processing speed barriers common to traditional computing methods. In conjunction with a virtual reality system used to model materials, this enhancement allows the addition of accelerated MD simulation capability. The motivation is to provide a novel research environment which simultaneously allows visualization, simulation, modeling and analysis. The research goal is to investigate the structure and properties of inorganic nanostructures (e.g.,
silica glass nanosprings) under different conditions using this innovative computational system. The work presented outlines a description of the 3D/VR Visualization System and basic components, an overview of important considerations such as the physical environment, details on the setup and use of the novel system, a general procedure for the accelerated MD enhancement, technical information, and relevant remarks. The impact of this work is the creation of a unique computational system combining nanoscale materials simulation, visualization and interactivity in a virtual environment, which is both a research and teaching instrument at UC Merced.
Physics, Issue 94, Computational systems, visualization and immersive environments, interactive learning, graphical processing unit accelerated simulations, molecular dynamics simulations, nanostructures.
Large Scale Non-targeted Metabolomic Profiling of Serum by Ultra Performance Liquid Chromatography-Mass Spectrometry (UPLC-MS)
Institutions: Colorado State University.
Non-targeted metabolite profiling by ultra performance liquid chromatography coupled with mass spectrometry (UPLC-MS) is a powerful technique to investigate metabolism. The approach offers an unbiased and in-depth analysis that can enable the development of diagnostic tests, novel therapies, and further our understanding of disease processes. The inherent chemical diversity of the metabolome creates significant analytical challenges and there is no single experimental approach that can detect all metabolites. Additionally, the biological variation in individual metabolism and the dependence of metabolism on environmental factors necessitates large sample numbers to achieve the appropriate statistical power required for meaningful biological interpretation. To address these challenges, this tutorial outlines an analytical workflow for large scale non-targeted metabolite profiling of serum by UPLC-MS. The procedure includes guidelines for sample organization and preparation, data acquisition, quality control, and metabolite identification and will enable reliable acquisition of data for large experiments and provide a starting point for laboratories new to non-targeted metabolite profiling by UPLC-MS.
Chemistry, Issue 73, Biochemistry, Genetics, Molecular Biology, Physiology, Genomics, Proteins, Proteomics, Metabolomics, Metabolite Profiling, Non-targeted metabolite profiling, mass spectrometry, Ultra Performance Liquid Chromatography, UPLC-MS, serum, spectrometry
From Voxels to Knowledge: A Practical Guide to the Segmentation of Complex Electron Microscopy 3D-Data
Institutions: Lawrence Berkeley National Laboratory, Lawrence Berkeley National Laboratory, Lawrence Berkeley National Laboratory.
Modern 3D electron microscopy approaches have recently allowed unprecedented insight into the 3D ultrastructural organization of cells and tissues, enabling the visualization of large macromolecular machines, such as adhesion complexes, as well as higher-order structures, such as the cytoskeleton and cellular organelles in their respective cell and tissue context. Given the inherent complexity of cellular volumes, it is essential to first extract the features of interest in order to allow visualization, quantification, and therefore comprehension of their 3D organization. Each data set is defined by distinct characteristics, e.g.
, signal-to-noise ratio, crispness (sharpness) of the data, heterogeneity of its features, crowdedness of features, presence or absence of characteristic shapes that allow for easy identification, and the percentage of the entire volume that a specific region of interest occupies. All these characteristics need to be considered when deciding on which approach to take for segmentation.
The six different 3D ultrastructural data sets presented were obtained by three different imaging approaches: resin embedded stained electron tomography, focused ion beam- and serial block face- scanning electron microscopy (FIB-SEM, SBF-SEM) of mildly stained and heavily stained samples, respectively. For these data sets, four different segmentation approaches have been applied: (1) fully manual model building followed solely by visualization of the model, (2) manual tracing segmentation of the data followed by surface rendering, (3) semi-automated approaches followed by surface rendering, or (4) automated custom-designed segmentation algorithms followed by surface rendering and quantitative analysis. Depending on the combination of data set characteristics, it was found that typically one of these four categorical approaches outperforms the others, but depending on the exact sequence of criteria, more than one approach may be successful. Based on these data, we propose a triage scheme that categorizes both objective data set characteristics and subjective personal criteria for the analysis of the different data sets.
Bioengineering, Issue 90, 3D electron microscopy, feature extraction, segmentation, image analysis, reconstruction, manual tracing, thresholding
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
Institutions: Princeton University.
The aim of de novo
protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo
protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity.
To disseminate these methods for broader use we present Protein WISDOM (https://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.
Genetics, Issue 77, Molecular Biology, Bioengineering, Biochemistry, Biomedical Engineering, Chemical Engineering, Computational Biology, Genomics, Proteomics, Protein, Protein Binding, Computational Biology, Drug Design, optimization (mathematics), Amino Acids, Peptides, and Proteins, De novo protein and peptide design, Drug design, In silico sequence selection, Optimization, Fold specificity, Binding affinity, sequencing
Automated, Quantitative Cognitive/Behavioral Screening of Mice: For Genetics, Pharmacology, Animal Cognition and Undergraduate Instruction
Institutions: Rutgers University, Koç University, New York University, Fairfield University.
We describe a high-throughput, high-volume, fully automated, live-in 24/7 behavioral testing system for assessing the effects of genetic and pharmacological manipulations on basic mechanisms of cognition and learning in mice. A standard polypropylene mouse housing tub is connected through an acrylic tube to a standard commercial mouse test box. The test box has 3 hoppers, 2 of which are connected to pellet feeders. All are internally illuminable with an LED and monitored for head entries by infrared (IR) beams. Mice live in the environment, which eliminates handling during screening. They obtain their food during two or more daily feeding periods by performing in operant (instrumental) and Pavlovian (classical) protocols, for which we have written protocol-control software and quasi-real-time data analysis and graphing software. The data analysis and graphing routines are written in a MATLAB-based language created to simplify greatly the analysis of large time-stamped behavioral and physiological event records and to preserve a full data trail from raw data through all intermediate analyses to the published graphs and statistics within a single data structure. The data-analysis code harvests the data several times a day and subjects it to statistical and graphical analyses, which are automatically stored in the "cloud" and on in-lab computers. Thus, the progress of individual mice is visualized and quantified daily. The data-analysis code talks to the protocol-control code, permitting the automated advance from protocol to protocol of individual subjects. The behavioral protocols implemented are matching, autoshaping, timed hopper-switching, risk assessment in timed hopper-switching, impulsivity measurement, and the circadian anticipation of food availability. Open-source protocol-control and data-analysis code makes the addition of new protocols simple. Eight test environments fit in a 48 in x 24 in x 78 in cabinet; two such cabinets (16 environments) may be controlled by one computer.
Behavior, Issue 84, genetics, cognitive mechanisms, behavioral screening, learning, memory, timing
Phage Phenomics: Physiological Approaches to Characterize Novel Viral Proteins
Institutions: San Diego State University, San Diego State University, San Diego State University, San Diego State University, San Diego State University, Argonne National Laboratory, Broad Institute.
Current investigations into phage-host interactions are dependent on extrapolating knowledge from (meta)genomes. Interestingly, 60 - 95% of all phage sequences share no homology to current annotated proteins. As a result, a large proportion of phage genes are annotated as hypothetical. This reality heavily affects the annotation of both structural and auxiliary metabolic genes. Here we present phenomic methods designed to capture the physiological response(s) of a selected host during expression of one of these unknown phage genes. Multi-phenotype Assay Plates (MAPs) are used to monitor the diversity of host substrate utilization and subsequent biomass formation, while metabolomics provides bi-product analysis by monitoring metabolite abundance and diversity. Both tools are used simultaneously to provide a phenotypic profile associated with expression of a single putative phage open reading frame (ORF). Representative results for both methods are compared, highlighting the phenotypic profile differences of a host carrying either putative structural or metabolic phage genes. In addition, the visualization techniques and high throughput computational pipelines that facilitated experimental analysis are presented.
Immunology, Issue 100, phenomics, phage, viral metagenome, Multi-phenotype Assay Plates (MAPs), continuous culture, metabolomics
Cortical Source Analysis of High-Density EEG Recordings in Children
Institutions: UCL Institute of Child Health, University College London.
EEG is traditionally described as a neuroimaging technique with high temporal and low spatial resolution. Recent advances in biophysical modelling and signal processing make it possible to exploit information from other imaging modalities like structural MRI that provide high spatial resolution to overcome this constraint1
. This is especially useful for investigations that require high resolution in the temporal as well as spatial domain. In addition, due to the easy application and low cost of EEG recordings, EEG is often the method of choice when working with populations, such as young children, that do not tolerate functional MRI scans well. However, in order to investigate which neural substrates are involved, anatomical information from structural MRI is still needed. Most EEG analysis packages work with standard head models that are based on adult anatomy. The accuracy of these models when used for children is limited2
, because the composition and spatial configuration of head tissues changes dramatically over development3
In the present paper, we provide an overview of our recent work in utilizing head models based on individual structural MRI scans or age specific head models to reconstruct the cortical generators of high density EEG. This article describes how EEG recordings are acquired, processed, and analyzed with pediatric populations at the London Baby Lab, including laboratory setup, task design, EEG preprocessing, MRI processing, and EEG channel level and source analysis.
Behavior, Issue 88, EEG, electroencephalogram, development, source analysis, pediatric, minimum-norm estimation, cognitive neuroscience, event-related potentials
Workflow for High-content, Individual Cell Quantification of Fluorescent Markers from Universal Microscope Data, Supported by Open Source Software
Institutions: UCL Cancer Institute.
Advances in understanding the control mechanisms governing the behavior of cells in adherent mammalian tissue culture models are becoming increasingly dependent on modes of single-cell analysis. Methods which deliver composite data reflecting the mean values of biomarkers from cell populations risk losing subpopulation dynamics that reflect the heterogeneity of the studied biological system. In keeping with this, traditional approaches are being replaced by, or supported with, more sophisticated forms of cellular assay developed to allow assessment by high-content microscopy. These assays potentially generate large numbers of images of fluorescent biomarkers, which enabled by accompanying proprietary software packages, allows for multi-parametric measurements per cell. However, the relatively high capital costs and overspecialization of many of these devices have prevented their accessibility to many investigators.
Described here is a universally applicable workflow for the quantification of multiple fluorescent marker intensities from specific subcellular regions of individual cells suitable for use with images from most fluorescent microscopes. Key to this workflow is the implementation of the freely available Cell Profiler software1
to distinguish individual cells in these images, segment them into defined subcellular regions and deliver fluorescence marker intensity values specific to these regions. The extraction of individual cell intensity values from image data is the central purpose of this workflow and will be illustrated with the analysis of control data from a siRNA screen for G1 checkpoint regulators in adherent human cells. However, the workflow presented here can be applied to analysis of data from other means of cell perturbation (e.g.
, compound screens) and other forms of fluorescence based cellular markers and thus should be useful for a wide range of laboratories.
Cellular Biology, Issue 94, Image analysis, High-content analysis, Screening, Microscopy, Individual cell analysis, Multiplexed assays
Purifying the Impure: Sequencing Metagenomes and Metatranscriptomes from Complex Animal-associated Samples
Institutions: San Diego State University, DOE Joint Genome Institute, University of Colorado, University of Colorado.
The accessibility of high-throughput sequencing has revolutionized many fields of biology. In order to better understand host-associated viral and microbial communities, a comprehensive workflow for DNA and RNA extraction was developed. The workflow concurrently generates viral and microbial metagenomes, as well as metatranscriptomes, from a single sample for next-generation sequencing. The coupling of these approaches provides an overview of both the taxonomical characteristics and the community encoded functions. The presented methods use Cystic Fibrosis (CF) sputum, a problematic sample type, because it is exceptionally viscous and contains high amount of mucins, free neutrophil DNA, and other unknown contaminants. The protocols described here target these problems and successfully recover viral and microbial DNA with minimal human DNA contamination. To complement the metagenomics studies, a metatranscriptomics protocol was optimized to recover both microbial and host mRNA that contains relatively few ribosomal RNA (rRNA) sequences. An overview of the data characteristics is presented to serve as a reference for assessing the success of the methods. Additional CF sputum samples were also collected to (i) evaluate the consistency of the microbiome profiles across seven consecutive days within a single patient, and (ii) compare the consistency of metagenomic approach to a 16S ribosomal RNA gene-based sequencing. The results showed that daily fluctuation of microbial profiles without antibiotic perturbation was minimal and the taxonomy profiles of the common CF-associated bacteria were highly similar between the 16S rDNA libraries and metagenomes generated from the hypotonic lysis (HL)-derived DNA. However, the differences between 16S rDNA taxonomical profiles generated from total DNA and HL-derived DNA suggest that hypotonic lysis and the washing steps benefit in not only removing the human-derived DNA, but also microbial-derived extracellular DNA that may misrepresent the actual microbial profiles.
Molecular Biology, Issue 94, virome, microbiome, metagenomics, metatranscriptomics, cystic fibrosis, mucosal-surface
The ITS2 Database
Institutions: University of Würzburg, University of Würzburg.
The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution1
and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation2-8
The ITS2 Database9
presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank11
. Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold12
(direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling13
. In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold.
The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST14
search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE15,16
for multiple sequence-structure alignment calculation and Neighbor Joining18
tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure.
In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses.
Genetics, Issue 61, alignment, internal transcribed spacer 2, molecular systematics, secondary structure, ribosomal RNA, phylogenetic tree, homology modeling, phylogeny
Facilitating the Analysis of Immunological Data with Visual Analytic Techniques
Institutions: University of British Columbia, University of British Columbia, University of British Columbia.
Visual analytics (VA) has emerged as a new way to analyze large dataset through interactive visual display. We demonstrated the utility and the flexibility of a VA approach in the analysis of biological datasets. Examples of these datasets in immunology include flow cytometry, Luminex data, and genotyping (e.g., single nucleotide polymorphism) data. Contrary to the traditional information visualization approach, VA restores the analysis power in the hands of analyst by allowing the analyst to engage in real-time data exploration process. We selected the VA software called Tableau after evaluating several VA tools. Two types of analysis tasks analysis within and between datasets were demonstrated in the video presentation using an approach called paired analysis. Paired analysis, as defined in VA, is an analysis approach in which a VA tool expert works side-by-side with a domain expert during the analysis. The domain expert is the one who understands the significance of the data, and asks the questions that the collected data might address. The tool expert then creates visualizations to help find patterns in the data that might answer these questions. The short lag-time between the hypothesis generation and the rapid visual display of the data is the main advantage of a VA approach.
Immunology, Issue 47, Visual analytics, flow cytometry, Luminex, Tableau, cytokine, innate immunity, single nucleotide polymorphism
A Novel High-resolution In vivo Imaging Technique to Study the Dynamic Response of Intracranial Structures to Tumor Growth and Therapeutics
Institutions: Hospital for Sick Children, Toronto Medical Discovery Tower, Princess Margaret Hospital, Toronto Western Hospital.
We have successfully integrated previously established Intracranial window (ICW) technology 1-4
with intravital 2-photon confocal microscopy to develop a novel platform that allows for direct long-term visualization of tissue structure changes intracranially. Imaging at a single cell resolution in a real-time fashion provides supplementary dynamic information beyond that provided by standard end-point histological analysis, which looks solely at 'snap-shot' cross sections of tissue.
Establishing this intravital imaging technique in fluorescent chimeric mice, we are able to image four fluorescent channels simultaneously. By incorporating fluorescently labeled cells, such as GFP+ bone marrow, it is possible to track the fate of these cells studying their long-term migration, integration and differentiation within tissue. Further integration of a secondary reporter cell, such as an mCherry glioma tumor line, allows for characterization of cell:cell interactions. Structural changes in the tissue microenvironment can be highlighted through the addition of intra-vital dyes and antibodies, for example CD31 tagged antibodies and Dextran molecules.
Moreover, we describe the combination of our ICW imaging model with a small animal micro-irradiator that provides stereotactic irradiation, creating a platform through which the dynamic tissue changes that occur following the administration of ionizing irradiation can be assessed.
Current limitations of our model include penetrance of the microscope, which is limited to a depth of up to 900 μm from the sub cortical surface, limiting imaging to the dorsal axis of the brain. The presence of the skull bone makes the ICW a more challenging technical procedure, compared to the more established and utilized chamber models currently used to study mammary tissue and fat pads 5-7
. In addition, the ICW provides many challenges when optimizing the imaging.
Cancer Biology, Issue 76, Medicine, Biomedical Engineering, Cellular Biology, Molecular Biology, Genetics, Neuroscience, Neurobiology, Biophysics, Anatomy, Physiology, Surgery, Intracranial Window, In vivo imaging, Stereotactic radiation, Bone Marrow Derived Cells, confocal microscopy, two-photon microscopy, drug-cell interactions, drug kinetics, brain, imaging, tumors, animal model