Many researchers, across incredibly diverse foci, are applying phylogenetics to their research question(s). However, many researchers are new to this topic and so it presents inherent problems. Here we compile a practical introduction to phylogenetics for nonexperts. We outline in a step-by-step manner, a pipeline for generating reliable phylogenies from gene sequence datasets. We begin with a user-guide for similarity search tools via online interfaces as well as local executables. Next, we explore programs for generating multiple sequence alignments followed by protocols for using software to determine best-fit models of evolution. We then outline protocols for reconstructing phylogenetic relationships via maximum likelihood and Bayesian criteria and finally describe tools for visualizing phylogenetic trees. While this is not by any means an exhaustive description of phylogenetic approaches, it does provide the reader with practical starting information on key software applications commonly utilized by phylogeneticists. The vision for this article would be that it could serve as a practical training tool for researchers embarking on phylogenetic studies and also serve as an educational resource that could be incorporated into a classroom or teaching-lab.
21 Related JoVE Articles!
A PCR-based Genotyping Method to Distinguish Between Wild-type and Ornamental Varieties of Imperata cylindrica
Institutions: The University of Alabama, Huntsville, Center for Plant Health Science and Technology.
Wild-type I. cylindrica
(cogongrass) is one of the top ten worst invasive plants in the world, negatively impacting agricultural and natural resources in 73 different countries throughout Africa, Asia, Europe, New Zealand, Oceania and the Americas1-2
. Cogongrass forms rapidly-spreading, monodominant stands that displace a large variety of native plant species and in turn threaten the native animals that depend on the displaced native plant species for forage and shelter. To add to the problem, an ornamental variety [I. cylindrica
(Retzius)] is widely marketed under the names of Imperata cylindrica
'Rubra', Red Baron, and Japanese blood grass (JBG). This variety is putatively sterile and noninvasive and is considered a desirable ornamental for its red-colored leaves. However, under the correct conditions, JBG can produce viable seed (Carol Holko, 2009 personal communication) and can revert to a green invasive form that is often indistinguishable from cogongrass as it takes on the distinguishing characteristics of the wild-type invasive variety4
). This makes identification using morphology a difficult task even for well-trained plant taxonomists. Reversion of JBG to an aggressive green phenotype is also not a rare occurrence. Using sequence comparisons of coding and variable regions in both nuclear and chloroplast DNA, we have confirmed that JBG has reverted to the green invasive within the states of Maryland, South Carolina, and Missouri. JBG has been sold and planted in just about every state in the continental U.S. where there is not an active cogongrass infestation. The extent of the revert problem in not well understood because reverted plants are undocumented and often destroyed.
Application of this molecular protocol provides a method to identify JBG reverts and can help keep these varieties from co-occurring and possibly hybridizing. Cogongrass is an obligate outcrosser and, when crossed with a different genotype, can produce viable wind-dispersed seeds that spread cogongrass over wide distances5-7
. JBG has a slightly different genotype than cogongrass and may be able to form viable hybrids with cogongrass. To add to the problem, JBG is more cold and shade tolerant than cogongrass8-10
, and gene flow between these two varieties is likely to generate hybrids that are more aggressive, shade tolerant, and cold hardy than wild-type cogongrass. While wild-type cogongrass currently infests over 490 million hectares worldwide, in the Southeast U.S. it infests over 500,000 hectares and is capable of occupying most of the U.S. as it rapidly spreads northward due to its broad niche and geographic potential3,7,11
. The potential of a genetic crossing is a serious concern for the USDA-APHIS Federal Noxious Week Program. Currently, the USDA-APHIS prohibits JBG in states where there are major cogongrass infestations (e.g., Florida, Alabama, Mississippi). However, preventing the two varieties from combining can prove more difficult as cogongrass and JBG expand their distributions. Furthermore, the distribution of the JBG revert is currently unknown and without the ability to identify these varieties through morphology, some cogongrass infestations may be the result of JBG reverts. Unfortunately, current molecular methods of identification typically rely on AFLP (Amplified Fragment Length Polymorphisms) and DNA sequencing, both of which are time consuming and costly. Here, we present the first cost-effective and reliable PCR-based molecular genotyping method to accurately distinguish between cogongrass and JBG revert.
Molecular Biology, Issue 60, Molecular genotyping, Japanese blood grass, Red Baron, cogongrass, invasive plants
Simple Microfluidic Devices for in vivo Imaging of C. elegans, Drosophila and Zebrafish
Institutions: NCBS-TIFR, TIFR.
Micro fabricated fluidic devices provide an accessible micro-environment for in vivo
studies on small organisms. Simple fabrication processes are available for microfluidic devices using soft lithography techniques 1-3
. Microfluidic devices have been used for sub-cellular imaging 4,5
, in vivo
laser microsurgery 2,6
and cellular imaging 4,7
. In vivo
imaging requires immobilization of organisms. This has been achieved using suction 5,8
, tapered channels 6,7,9
, deformable membranes 2-4,10
, suction with additional cooling 5
, anesthetic gas 11
, temperature sensitive gels 12
, cyanoacrylate glue 13
and anesthetics such as levamisole 14,15
. Commonly used anesthetics influence synaptic transmission 16,17
and are known to have detrimental effects on sub-cellular neuronal transport 4
. In this study we demonstrate a membrane based poly-dimethyl-siloxane (PDMS) device that allows anesthetic free immobilization of intact genetic model organisms such as Caenorhabditis elegans
larvae and zebrafish larvae. These model organisms are suitable for in vivo
studies in microfluidic devices because of their small diameters and optically transparent or translucent bodies. Body diameters range from ~10 μm to ~800 μm for early larval stages of C. elegans
and zebrafish larvae and require microfluidic devices of different sizes to achieve complete immobilization for high resolution time-lapse imaging. These organisms are immobilized using pressure applied by compressed nitrogen gas through a liquid column and imaged using an inverted microscope. Animals released from the trap return to normal locomotion within 10 min.
We demonstrate four applications of time-lapse imaging in C. elegans
namely, imaging mitochondrial transport in neurons, pre-synaptic vesicle transport in a transport-defective mutant, glutamate receptor transport and Q neuroblast cell division. Data obtained from such movies show that microfluidic immobilization is a useful and accurate means of acquiring in vivo
data of cellular and sub-cellular events when compared to anesthetized animals (Figure 1J
and 3C-F 4
Device dimensions were altered to allow time-lapse imaging of different stages of C. elegans
, first instar Drosophila
larvae and zebrafish larvae. Transport of vesicles marked with synaptotagmin tagged with GFP (syt.eGFP) in sensory neurons shows directed motion of synaptic vesicle markers expressed in cholinergic sensory neurons in intact first instar Drosophila
larvae. A similar device has been used to carry out time-lapse imaging of heartbeat in ~30 hr post fertilization (hpf) zebrafish larvae. These data show that the simple devices we have developed can be applied to a variety of model systems to study several cell biological and developmental phenomena in vivo
Bioengineering, Issue 67, Molecular Biology, Neuroscience, Microfluidics, C. elegans, Drosophila larvae, zebrafish larvae, anesthetic, pre-synaptic vesicle transport, dendritic transport of glutamate receptors, mitochondrial transport, synaptotagmin transport, heartbeat
Diagnosing Pulmonary Tuberculosis with the Xpert MTB/RIF Test
Institutions: University of Bern, MCL Laboratories Inc..
Tuberculosis (TB) due to Mycobacterium tuberculosis
(MTB) remains a major public health issue: the infection affects up to one third of the world population1
, and almost two million people are killed by TB each year.2
Universal access to high-quality, patient-centered treatment for all TB patients is emphasized by WHO's Stop TB Strategy.3
The rapid detection of MTB in respiratory specimens and drug therapy based on reliable drug resistance testing results are a prerequisite for the successful implementation of this strategy. However, in many areas of the world, TB diagnosis still relies on insensitive, poorly standardized sputum microscopy methods. Ineffective TB detection and the emergence and transmission of drug-resistant MTB strains increasingly jeopardize global TB control activities.2
Effective diagnosis of pulmonary TB requires the availability - on a global scale - of standardized, easy-to-use, and robust diagnostic tools that would allow the direct detection of both the MTB complex and resistance to key antibiotics, such as rifampicin (RIF). The latter result can serve as marker for multidrug-resistant MTB (MDR TB) and has been reported in > 95% of the MDR-TB isolates.4, 5
The rapid availability of reliable test results is likely to directly translate into sound patient management decisions that, ultimately, will cure the individual patient and break the chain of TB transmission in the community.2
Cepheid's (Sunnyvale, CA, U.S.A.) Xpert MTB/RIF assay6, 7
meets the demands outlined above in a remarkable manner. It is a nucleic-acids amplification test for 1) the detection of MTB complex DNA in sputum or concentrated sputum sediments; and 2) the detection of RIF resistance-associated mutations of the rpoB
It is designed for use with Cepheid's GeneXpert Dx System that integrates and automates sample processing, nucleic acid amplification, and detection of the target sequences using real-time PCR and reverse transcriptase PCR. The system consists of an instrument, personal computer, barcode scanner, and preloaded software for running tests and viewing the results.9
It employs single-use disposable Xpert MTB/RIF cartridges that hold PCR reagents and host the PCR process. Because the cartridges are self-contained, cross-contamination between samples is eliminated.6
Current nucleic acid amplification methods used to detect MTB are complex, labor-intensive, and technically demanding. The Xpert MTB/RIF assay has the potential to bring standardized, sensitive and very specific diagnostic testing for both TB and drug resistance to universal-access point-of-care settings3
, provided that they will be able to afford it. In order to facilitate access, the Foundation for Innovative New Diagnostics (FIND) has negotiated significant price reductions. Current FIND-negotiated prices, along with the list of countries eligible for the discounts, are available on the web.10
Immunology, Issue 62, tuberculosis, drug resistance, rifampicin, rapid diagnosis, Xpert MTB/RIF test
A Noninvasive Hair Sampling Technique to Obtain High Quality DNA from Elusive Small Mammals
Institutions: University of British Columbia, Okanagan Campus.
Noninvasive genetic sampling approaches are becoming increasingly important to study wildlife populations. A number of studies have reported using noninvasive sampling techniques to investigate population genetics and demography of wild populations1
. This approach has proven to be especially useful when dealing with rare or elusive species2
. While a number of these methods have been developed to sample hair, feces and other biological material from carnivores and medium-sized mammals, they have largely remained untested in elusive small mammals. In this video, we present a novel, inexpensive and noninvasive hair snare targeted at an elusive small mammal, the American pika (Ochotona princeps
). We describe the general set-up of the hair snare, which consists of strips of packing tape arranged in a web-like fashion and placed along travelling routes in the pikas’ habitat. We illustrate the efficiency of the snare at collecting a large quantity of hair that can then be collected and brought back to the lab. We then demonstrate the use of the DNA IQ system (Promega) to isolate DNA and showcase the utility of this method to amplify commonly used molecular markers including nuclear microsatellites, amplified fragment length polymorphisms (AFLPs), mitochondrial sequences (800bp) as well as a molecular sexing marker. Overall, we demonstrate the utility of this novel noninvasive hair snare as a sampling technique for wildlife population biologists. We anticipate that this approach will be applicable to a variety of small mammals, opening up areas of investigation within natural populations, while minimizing impact to study organisms.
Genetics, Issue 49, Conservation genetics, noninvasive genetic sampling, Hair snares, Microsatellites, AFLPs, American pika, Ochotona princeps
The ITS2 Database
Institutions: University of Würzburg, University of Würzburg.
The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution1
and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation2-8
The ITS2 Database9
presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank11
. Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold12
(direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling13
. In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold.
The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST14
search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE15,16
for multiple sequence-structure alignment calculation and Neighbor Joining18
tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure.
In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses.
Genetics, Issue 61, alignment, internal transcribed spacer 2, molecular systematics, secondary structure, ribosomal RNA, phylogenetic tree, homology modeling, phylogeny
Prediction of HIV-1 Coreceptor Usage (Tropism) by Sequence Analysis using a Genotypic Approach
Institutions: University of Cologne, Max Planck Institute for Informatics, Institute for Immune genetics, University of Duesseldorf, University of Essen, University of Cologne, Augustinerinnen Hospital.
Maraviroc (MVC) is the first licensed antiretroviral drug from the class of coreceptor antagonists. It binds to the host coreceptor CCR5, which is used by the majority of HIV strains in order to infect the human immune cells (Fig. 1). Other HIV isolates use a different coreceptor, the CXCR4. Which receptor is used, is determined in the virus by the Env protein (Fig. 2). Depending on the coreceptor used, the viruses are classified as R5 or X4, respectively. MVC binds to the CCR5 receptor inhibiting the entry of R5 viruses into the target cell. During the course of disease, X4 viruses may emerge and outgrow the R5 viruses. Determination of coreceptor usage (also called tropism) is therefore mandatory prior to administration of MVC, as demanded by EMA and FDA.
The studies for MVC efficiency MOTIVATE, MERIT and 1029 have been performed with the Trofile assay from Monogram, San Francisco, U.S.A. This is a high quality assay based on sophisticated recombinant tests. The acceptance for this test for daily routine is rather low outside of the U.S.A., since the European physicians rather tend to work with decentralized expert laboratories, which also provide concomitant resistance testing. These laboratories have undergone several quality assurance evaluations, the last one being presented in 20111
For several years now, we have performed tropism determinations based on sequence analysis from the HIV env-V3 gene region (V3)2
. This region carries enough information to perform a reliable prediction.
The genotypic determination of coreceptor usage presents advantages such as: shorter turnover time (equivalent to resistance testing), lower costs, possibility to adapt the results to the patients' needs and possibility of analysing clinical samples with very low or even undetectable viral load (VL), particularly since the number of samples analysed with VL<1000 copies/μl roughly increased in the last years (Fig. 3).
The main steps for tropism testing (Fig. 4) demonstrated in this video:
1. Collection of a blood sample
2. Isolation of the HIV RNA from the plasma and/or HIV proviral DNA from blood mononuclear cells
3. Amplification of the env
4. Amplification of the V3 region
5. Sequence reaction of the V3 amplicon
6. Purification of the sequencing samples
7. Sequencing the purified samples
8. Sequence editing
9. Sequencing data interpretation and tropism prediction
Immunology, Issue 58, HIV-1, coreceptor, coreceptor antagonist, prediction of coreceptor usage, tropism, R5, X4, maraviroc, MVC
Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
Institutions: University of Toledo Health Science Campus.
Non-coding genomic regions in complex eukaryotes, including intergenic areas, introns, and untranslated segments of exons, are profoundly non-random in their nucleotide composition and consist of a complex mosaic of sequence patterns. These patterns include so-called Mid-Range Inhomogeneity (MRI) regions -- sequences 30-10000 nucleotides in length that are enriched by a particular base or combination of bases (e.g. (G+T)-rich, purine-rich, etc.). MRI regions are associated with unusual (non-B-form) DNA structures that are often involved in regulation of gene expression, recombination, and other genetic processes (Fedorova & Fedorov 2010). The existence of a strong fixation bias within MRI regions against mutations that tend to reduce their sequence inhomogeneity additionally supports the functionality and importance of these genomic sequences (Prakash et al.
Here we demonstrate a freely available Internet resource -- the Genomic MRI
program package -- designed for computational analysis of genomic sequences in order to find and characterize various MRI patterns within them (Bechtel et al.
2008). This package also allows generation of randomized sequences with various properties and level of correspondence to the natural input DNA sequences. The main goal of this resource is to facilitate examination of vast regions of non-coding DNA that are still scarcely investigated and await thorough exploration and recognition.
Genetics, Issue 51, bioinformatics, computational biology, genomics, non-randomness, signals, gene regulation, DNA conformation
Test Samples for Optimizing STORM Super-Resolution Microscopy
Institutions: National Physical Laboratory.
STORM is a recently developed super-resolution microscopy technique with up to 10 times better resolution than standard fluorescence microscopy techniques. However, as the image is acquired in a very different way than normal, by building up an image molecule-by-molecule, there are some significant challenges for users in trying to optimize their image acquisition. In order to aid this process and gain more insight into how STORM works we present the preparation of 3 test samples and the methodology of acquiring and processing STORM super-resolution images with typical resolutions of between 30-50 nm. By combining the test samples with the use of the freely available rainSTORM processing software it is possible to obtain a great deal of information about image quality and resolution. Using these metrics it is then possible to optimize the imaging procedure from the optics, to sample preparation, dye choice, buffer conditions, and image acquisition settings. We also show examples of some common problems that result in poor image quality, such as lateral drift, where the sample moves during image acquisition and density related problems resulting in the 'mislocalization' phenomenon.
Molecular Biology, Issue 79, Genetics, Bioengineering, Biomedical Engineering, Biophysics, Basic Protocols, HeLa Cells, Actin Cytoskeleton, Coated Vesicles, Receptor, Epidermal Growth Factor, Actins, Fluorescence, Endocytosis, Microscopy, STORM, super-resolution microscopy, nanoscopy, cell biology, fluorescence microscopy, test samples, resolution, actin filaments, fiducial markers, epidermal growth factor, cell, imaging
Detecting Somatic Genetic Alterations in Tumor Specimens by Exon Capture and Massively Parallel Sequencing
Institutions: Memorial Sloan-Kettering Cancer Center, Memorial Sloan-Kettering Cancer Center.
Efforts to detect and investigate key oncogenic mutations have proven valuable to facilitate the appropriate treatment for cancer patients. The establishment of high-throughput, massively parallel "next-generation" sequencing has aided the discovery of many such mutations. To enhance the clinical and translational utility of this technology, platforms must be high-throughput, cost-effective, and compatible with formalin-fixed paraffin embedded (FFPE) tissue samples that may yield small amounts of degraded or damaged DNA. Here, we describe the preparation of barcoded and multiplexed DNA libraries followed by hybridization-based capture of targeted exons for the detection of cancer-associated mutations in fresh frozen and FFPE tumors by massively parallel sequencing. This method enables the identification of sequence mutations, copy number alterations, and select structural rearrangements involving all targeted genes. Targeted exon sequencing offers the benefits of high throughput, low cost, and deep sequence coverage, thus conferring high sensitivity for detecting low frequency mutations.
Molecular Biology, Issue 80, Molecular Diagnostic Techniques, High-Throughput Nucleotide Sequencing, Genetics, Neoplasms, Diagnosis, Massively parallel sequencing, targeted exon sequencing, hybridization capture, cancer, FFPE, DNA mutations
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
Institutions: Princeton University.
The aim of de novo
protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo
protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity.
To disseminate these methods for broader use we present Protein WISDOM (https://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.
Genetics, Issue 77, Molecular Biology, Bioengineering, Biochemistry, Biomedical Engineering, Chemical Engineering, Computational Biology, Genomics, Proteomics, Protein, Protein Binding, Computational Biology, Drug Design, optimization (mathematics), Amino Acids, Peptides, and Proteins, De novo protein and peptide design, Drug design, In silico sequence selection, Optimization, Fold specificity, Binding affinity, sequencing
Next-generation Sequencing of 16S Ribosomal RNA Gene Amplicons
Institutions: National Research Council Canada.
One of the major questions in microbial ecology is “who is there?” This question can be answered using various tools, but one of the long-lasting gold standards is to sequence 16S ribosomal RNA (rRNA) gene amplicons generated by domain-level PCR reactions amplifying from genomic DNA. Traditionally, this was performed by cloning and Sanger (capillary electrophoresis) sequencing of PCR amplicons. The advent of next-generation sequencing has tremendously simplified and increased the sequencing depth for 16S rRNA gene sequencing. The introduction of benchtop sequencers now allows small labs to perform their 16S rRNA sequencing in-house in a matter of days. Here, an approach for 16S rRNA gene amplicon sequencing using a benchtop next-generation sequencer is detailed. The environmental DNA is first amplified by PCR using primers that contain sequencing adapters and barcodes. They are then coupled to spherical particles via emulsion PCR. The particles are loaded on a disposable chip and the chip is inserted in the sequencing machine after which the sequencing is performed. The sequences are retrieved in fastq format, filtered and the barcodes are used to establish the sample membership of the reads. The filtered and binned reads are then further analyzed using publically available tools. An example analysis where the reads were classified with a taxonomy-finding algorithm within the software package Mothur is given. The method outlined here is simple, inexpensive and straightforward and should help smaller labs to take advantage from the ongoing genomic revolution.
Molecular Biology, Issue 90, Metagenomics, Bacteria, 16S ribosomal RNA gene, Amplicon sequencing, Next-generation sequencing, benchtop sequencers
Chromatin Interaction Analysis with Paired-End Tag Sequencing (ChIA-PET) for Mapping Chromatin Interactions and Understanding Transcription Regulation
Institutions: Agency for Science, Technology and Research, Singapore, A*STAR-Duke-NUS Neuroscience Research Partnership, Singapore, National University of Singapore, Singapore.
Genomes are organized into three-dimensional structures, adopting higher-order conformations inside the micron-sized nuclear spaces 7, 2, 12
. Such architectures are not random and involve interactions between gene promoters and regulatory elements 13
. The binding of transcription factors to specific regulatory sequences brings about a network of transcription regulation and coordination 1, 14
Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) was developed to identify these higher-order chromatin structures 5,6
. Cells are fixed and interacting loci are captured by covalent DNA-protein cross-links. To minimize non-specific noise and reduce complexity, as well as to increase the specificity of the chromatin interaction analysis, chromatin immunoprecipitation (ChIP) is used against specific protein factors to enrich chromatin fragments of interest before proximity ligation. Ligation involving half-linkers subsequently forms covalent links between pairs of DNA fragments tethered together within individual chromatin complexes. The flanking MmeI restriction enzyme sites in the half-linkers allow extraction of paired end tag-linker-tag constructs (PETs) upon MmeI digestion. As the half-linkers are biotinylated, these PET constructs are purified using streptavidin-magnetic beads. The purified PETs are ligated with next-generation sequencing adaptors and a catalog of interacting fragments is generated via next-generation sequencers such as the Illumina Genome Analyzer. Mapping and bioinformatics analysis is then performed to identify ChIP-enriched binding sites and ChIP-enriched chromatin interactions 8
We have produced a video to demonstrate critical aspects of the ChIA-PET protocol, especially the preparation of ChIP as the quality of ChIP plays a major role in the outcome of a ChIA-PET library. As the protocols are very long, only the critical steps are shown in the video.
Genetics, Issue 62, ChIP, ChIA-PET, Chromatin Interactions, Genomics, Next-Generation Sequencing
Automated, Quantitative Cognitive/Behavioral Screening of Mice: For Genetics, Pharmacology, Animal Cognition and Undergraduate Instruction
Institutions: Rutgers University, Koç University, New York University, Fairfield University.
We describe a high-throughput, high-volume, fully automated, live-in 24/7 behavioral testing system for assessing the effects of genetic and pharmacological manipulations on basic mechanisms of cognition and learning in mice. A standard polypropylene mouse housing tub is connected through an acrylic tube to a standard commercial mouse test box. The test box has 3 hoppers, 2 of which are connected to pellet feeders. All are internally illuminable with an LED and monitored for head entries by infrared (IR) beams. Mice live in the environment, which eliminates handling during screening. They obtain their food during two or more daily feeding periods by performing in operant (instrumental) and Pavlovian (classical) protocols, for which we have written protocol-control software and quasi-real-time data analysis and graphing software. The data analysis and graphing routines are written in a MATLAB-based language created to simplify greatly the analysis of large time-stamped behavioral and physiological event records and to preserve a full data trail from raw data through all intermediate analyses to the published graphs and statistics within a single data structure. The data-analysis code harvests the data several times a day and subjects it to statistical and graphical analyses, which are automatically stored in the "cloud" and on in-lab computers. Thus, the progress of individual mice is visualized and quantified daily. The data-analysis code talks to the protocol-control code, permitting the automated advance from protocol to protocol of individual subjects. The behavioral protocols implemented are matching, autoshaping, timed hopper-switching, risk assessment in timed hopper-switching, impulsivity measurement, and the circadian anticipation of food availability. Open-source protocol-control and data-analysis code makes the addition of new protocols simple. Eight test environments fit in a 48 in x 24 in x 78 in cabinet; two such cabinets (16 environments) may be controlled by one computer.
Behavior, Issue 84, genetics, cognitive mechanisms, behavioral screening, learning, memory, timing
Competitive Genomic Screens of Barcoded Yeast Libraries
Institutions: University of Toronto, University of Toronto, University of Toronto, National Human Genome Research Institute, NIH, Stanford University , University of Toronto.
By virtue of advances in next generation sequencing technologies, we have access to new genome sequences almost daily. The tempo of these advances is accelerating, promising greater depth and breadth. In light of these extraordinary advances, the need for fast, parallel methods to define gene function becomes ever more important. Collections of genome-wide deletion mutants in yeasts and E. coli
have served as workhorses for functional characterization of gene function, but this approach is not scalable, current gene-deletion approaches require each of the thousands of genes that comprise a genome to be deleted and verified. Only after this work is complete can we pursue high-throughput phenotyping. Over the past decade, our laboratory has refined a portfolio of competitive, miniaturized, high-throughput genome-wide assays that can be performed in parallel. This parallelization is possible because of the inclusion of DNA 'tags', or 'barcodes,' into each mutant, with the barcode serving as a proxy for the mutation and one can measure the barcode abundance to assess mutant fitness. In this study, we seek to fill the gap between DNA sequence and barcoded mutant collections. To accomplish this we introduce a combined transposon disruption-barcoding approach that opens up parallel barcode assays to newly sequenced, but poorly characterized microbes. To illustrate this approach we present a new Candida albicans
barcoded disruption collection and describe how both microarray-based and next generation sequencing-based platforms can be used to collect 10,000 - 1,000,000 gene-gene and drug-gene interactions in a single experiment.
Biochemistry, Issue 54, chemical biology, chemogenomics, chemical probes, barcode microarray, next generation sequencing
Infinium Assay for Large-scale SNP Genotyping Applications
Institutions: Oklahoma Medical Research Foundation.
Genotyping variants in the human genome has proven to be an efficient method to identify genetic associations with phenotypes. The distribution of variants within families or populations can facilitate identification of the genetic factors of disease. Illumina's panel of genotyping BeadChips allows investigators to genotype thousands or millions of single nucleotide polymorphisms (SNPs) or to analyze other genomic variants, such as copy number, across a large number of DNA samples. These SNPs can be spread throughout the genome or targeted in specific regions in order to maximize potential discovery. The Infinium assay has been optimized to yield high-quality, accurate results quickly. With proper setup, a single technician can process from a few hundred to over a thousand DNA samples per week, depending on the type of array. This assay guides users through every step, starting with genomic DNA and ending with the scanning of the array. Using propriety reagents, samples are amplified, fragmented, precipitated, resuspended, hybridized to the chip, extended by a single base, stained, and scanned on either an iScan or Hi Scan high-resolution optical imaging system. One overnight step is required to amplify the DNA. The DNA is denatured and isothermally amplified by whole-genome amplification; therefore, no PCR is required. Samples are hybridized to the arrays during a second overnight step. By the third day, the samples are ready to be scanned and analyzed. Amplified DNA may be stockpiled in large quantities, allowing bead arrays to be processed every day of the week, thereby maximizing throughput.
Basic Protocol, Issue 81, genomics, SNP, Genotyping, Infinium, iScan, HiScan, Illumina
A Protocol for Computer-Based Protein Structure and Function Prediction
Institutions: University of Michigan , University of Kansas.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server.
Biochemistry, Issue 57, On-line server, I-TASSER, protein structure prediction, function prediction
Purifying Plasmid DNA from Bacterial Colonies Using the Qiagen Miniprep Kit
Institutions: University of California, Irvine (UCI).
Plasmid DNA purification from E. coli is a core technique for molecular cloning. Small scale purification (miniprep) from less than 5 ml of bacterial culture is a quick way for clone verification or DNA isolation, followed by further enzymatic reactions (polymerase chain reaction and restriction enzyme digestion). Here, we video-recorded the general procedures of miniprep through the QIAGEN's QIAprep 8 Miniprep Kit, aiming to introducing this highly efficient technique to the general beginners for molecular biology techniques. The whole procedure is based on alkaline lysis of E. coli cells followed by adsorption of DNA onto silica in the presence of high salt. It consists of three steps: 1) preparation and clearing of a bacterial lysate, 2) adsorption of DNA onto the QIAprep membrane, 3) washing and elution of plasmid DNA. All steps are performed without the use of phenol, chloroform, CsCl, ethidium bromide, and without alcohol precipitation. It usually takes less than 2 hours to finish the entire procedure.
Issue 6, Basic Protocols, plasmid, DNA, purification, Qiagen
Molecular Evolution of the Tre Recombinase
Institutions: Max Plank Institute for Molecular Cell Biology and Genetics, Dresden.
Here we report the generation of Tre recombinase through directed, molecular evolution. Tre recombinase recognizes a pre-defined target sequence within the LTR sequences of the HIV-1 provirus, resulting in the excision and eradication of the provirus from infected human cells.
We started with Cre, a 38-kDa recombinase, that recognizes a 34-bp double-stranded DNA sequence known as loxP. Because Cre can effectively eliminate genomic sequences, we set out to tailor a recombinase that could remove the sequence between the 5'-LTR and 3'-LTR of an integrated HIV-1 provirus. As a first step we identified sequences within the LTR sites that were similar to loxP and tested for recombination activity. Initially Cre and mutagenized Cre libraries failed to recombine the chosen loxLTR sites of the HIV-1 provirus. As the start of any directed molecular evolution process requires at least residual activity, the original asymmetric loxLTR sequences were split into subsets and tested again for recombination activity. Acting as intermediates, recombination activity was shown with the subsets. Next, recombinase libraries were enriched through reiterative evolution cycles. Subsequently, enriched libraries were shuffled and recombined. The combination of different mutations proved synergistic and recombinases were created that were able to recombine loxLTR1 and loxLTR2. This was evidence that an evolutionary strategy through intermediates can be successful. After a total of 126 evolution cycles individual recombinases were functionally and structurally analyzed. The most active recombinase -- Tre -- had 19 amino acid changes as compared to Cre. Tre recombinase was able to excise the HIV-1 provirus from the genome HIV-1 infected HeLa cells (see "HIV-1 Proviral DNA Excision Using an Evolved Recombinase", Hauber J., Heinrich-Pette-Institute for Experimental Virology and Immunology, Hamburg, Germany). While still in its infancy, directed molecular evolution will allow the creation of custom enzymes that will serve as tools of "molecular surgery" and molecular medicine.
Cell Biology, Issue 15, HIV-1, Tre recombinase, Site-specific recombination, molecular evolution
Electroporation of Mycobacteria
Institutions: Barts and the London School of Medicine and Dentistry, Barts and the London School of Medicine and Dentistry.
High efficiency transformation is a major limitation in the study of mycobacteria. The genus Mycobacterium can be difficult to transform; this is mainly caused by the thick and waxy cell wall, but is compounded by the fact that most molecular techniques have been developed for distantly-related species such as Escherichia coli and Bacillus subtilis. In spite of these obstacles, mycobacterial plasmids have been identified and DNA transformation of many mycobacterial species have now been described. The most successful method for introducing DNA into mycobacteria is electroporation. Many parameters contribute to successful transformation; these include the species/strain, the nature of the transforming DNA, the selectable marker used, the growth medium, and the conditions for the electroporation pulse. Optimized methods for the transformation of both slow- and fast-grower are detailed here. Transformation efficiencies for different mycobacterial species and with various selectable markers are reported.
Microbiology, Issue 15, Springer Protocols, Mycobacteria, Electroporation, Bacterial Transformation, Transformation Efficiency, Bacteria, Tuberculosis, M. Smegmatis, Springer Protocols
Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes
Institutions: Dartmouth College.
SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference1
. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data1
. In this article, we utilize a web version of SCOPE2
to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs3,4
and has been used in other studies5-8
The three algorithms that comprise SCOPE are BEAM9
, which finds non-degenerate motifs (ACCGGT), PRISM10
, which finds degenerate motifs (ASCGWT), and SPACER11
, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well.
Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor.
Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run.
Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and "strand" indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail1,2,9-11
Genetics, Issue 51, gene regulation, computational biology, algorithm, promoter sequence motif
Principles of Site-Specific Recombinase (SSR) Technology
Institutions: Max Plank Institute for Molecular Cell Biology and Genetics, Dresden.
Site-specific recombinase (SSR) technology allows the manipulation of gene structure to explore gene function and has become an integral tool of molecular biology. Site-specific recombinases are proteins that bind to distinct DNA target sequences. The Cre/lox system was first described in bacteriophages during the 1980's. Cre recombinase is a Type I topoisomerase that catalyzes site-specific recombination of DNA between two loxP (locus of X-over P1) sites. The Cre/lox system does not require any cofactors. LoxP sequences contain distinct binding sites for Cre recombinases that surround a directional core sequence where recombination and rearrangement takes place. When cells contain loxP sites and express the Cre recombinase, a recombination event occurs. Double-stranded DNA is cut at both loxP sites by the Cre recombinase, rearranged, and ligated ("scissors and glue"). Products of the recombination event depend on the relative orientation of the asymmetric sequences.
SSR technology is frequently used as a tool to explore gene function. Here the gene of interest is flanked with Cre target sites loxP ("floxed"). Animals are then crossed with animals expressing the Cre recombinase under the control of a tissue-specific promoter. In tissues that express the Cre recombinase it binds to target sequences and excises the floxed gene. Controlled gene deletion allows the investigation of gene function in specific tissues and at distinct time points. Analysis of gene function employing SSR technology --- conditional mutagenesis -- has significant advantages over traditional knock-outs where gene deletion is frequently lethal.
Cellular Biology, Issue 15, Molecular Biology, Site-Specific Recombinase, Cre recombinase, Cre/lox system, transgenic animals, transgenic technology