Many researchers, across incredibly diverse foci, are applying phylogenetics to their research question(s). However, many researchers are new to this topic and so it presents inherent problems. Here we compile a practical introduction to phylogenetics for nonexperts. We outline in a step-by-step manner, a pipeline for generating reliable phylogenies from gene sequence datasets. We begin with a user-guide for similarity search tools via online interfaces as well as local executables. Next, we explore programs for generating multiple sequence alignments followed by protocols for using software to determine best-fit models of evolution. We then outline protocols for reconstructing phylogenetic relationships via maximum likelihood and Bayesian criteria and finally describe tools for visualizing phylogenetic trees. While this is not by any means an exhaustive description of phylogenetic approaches, it does provide the reader with practical starting information on key software applications commonly utilized by phylogeneticists. The vision for this article would be that it could serve as a practical training tool for researchers embarking on phylogenetic studies and also serve as an educational resource that could be incorporated into a classroom or teaching-lab.
26 Related JoVE Articles!
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
Institutions: Stony Brook University, Cold Spring Harbor Laboratory, University of Texas at Dallas.
ChIPseq is a widely used technique for investigating protein-DNA interactions. Read density profiles are generated by using next-sequencing of protein-bound DNA and aligning the short reads to a reference genome. Enriched regions are revealed as peaks, which often differ dramatically in shape, depending on the target protein1
. For example, transcription factors often bind in a site- and sequence-specific manner and tend to produce punctate peaks, while histone modifications are more pervasive and are characterized by broad, diffuse islands of enrichment2
. Reliably identifying these regions was the focus of our work.
Algorithms for analyzing ChIPseq data have employed various methodologies, from heuristics3-5
to more rigorous statistical models, e.g.
Hidden Markov Models (HMMs)6-8
. We sought a solution that minimized the necessity for difficult-to-define, ad hoc parameters that often compromise resolution and lessen the intuitive usability of the tool. With respect to HMM-based methods, we aimed to curtail parameter estimation procedures and simple, finite state classifications that are often utilized.
Additionally, conventional ChIPseq data analysis involves categorization of the expected read density profiles as either punctate or diffuse followed by subsequent application of the appropriate tool. We further aimed to replace the need for these two distinct models with a single, more versatile model, which can capably address the entire spectrum of data types.
To meet these objectives, we first constructed a statistical framework that naturally modeled ChIPseq data structures using a cutting edge advance in HMMs9
, which utilizes only explicit formulas-an innovation crucial to its performance advantages. More sophisticated then heuristic models, our HMM accommodates infinite hidden states through a Bayesian model. We applied it to identifying reasonable change points in read density, which further define segments of enrichment. Our analysis revealed how our Bayesian Change Point (BCP) algorithm had a reduced computational complexity-evidenced by an abridged run time and memory footprint. The BCP algorithm was successfully applied to both punctate peak and diffuse island identification with robust accuracy and limited user-defined parameters. This illustrated both its versatility and ease of use. Consequently, we believe it can be implemented readily across broad ranges of data types and end users in a manner that is easily compared and contrasted, making it a great tool for ChIPseq data analysis that can aid in collaboration and corroboration between research groups. Here, we demonstrate the application of BCP to existing transcription factor10,11
and epigenetic data12
to illustrate its usefulness.
Genetics, Issue 70, Bioinformatics, Genomics, Molecular Biology, Cellular Biology, Immunology, Chromatin immunoprecipitation, ChIP-Seq, histone modifications, segmentation, Bayesian, Hidden Markov Models, epigenetics
Application of a C. elegans Dopamine Neuron Degeneration Assay for the Validation of Potential Parkinson's Disease Genes
Institutions: University of Alabama.
Improvements to the diagnosis and treatment of Parkinson's disease (PD) are dependent upon knowledge about susceptibility factors that render populations at risk. In the process of attempting to identify novel genetic factors associated with PD, scientists have generated many lists of candidate genes, polymorphisms, and proteins that represent important advances, but these leads remain mechanistically undefined. Our work is aimed toward significantly narrowing such lists by exploiting the advantages of a simple animal model system. While humans have billions of neurons, the microscopic roundworm Caenorhabditis elegans has precisely 302, of which only eight produce dopamine (DA) in hemaphrodites. Expression of a human gene encoding the PD-associated protein, alpha-synuclein, in C. elegans DA neurons results in dosage and age-dependent neurodegeneration.
Worms expressing human alpha-synuclein in DA neurons are isogenic and express both GFP and human alpha-synuclein under the DA transporter promoter (Pdat-1). The presence of GFP serves as a readily visualized marker for following DA neurodegeneration in these animals. We initially demonstrated that alpha-synuclein-induced DA neurodegeneration could be rescued in these animals by torsinA, a protein with molecular chaperone activity 1
. Further, candidate PD-related genes identified in our lab via large-scale RNAi screening efforts using an alpha-synuclein misfolding assay were then over-expressed in C. elegans DA neurons. We determined that five of seven genes tested represented significant candidate modulators of PD as they rescued alpha-synuclein-induced DA neurodegeneration 2
. Additionally, the Lindquist Lab (this issue of JoVE) has performed yeast screens whereby alpha-synuclein-dependent toxicity is used as a readout for genes that can enhance or suppress cytotoxicity. We subsequently examined the yeast candidate genes in our C. elegans alpha-synuclein-induced neurodegeneration assay and successfully validated many of these targets 3, 4
Our methodology involves generation of a C. elegans DA neuron-specific expression vector using recombinational cloning of candidate gene cDNAs under control of the Pdat-1 promoter. These plasmids are then microinjected in wild-type (N2) worms, along with a selectable marker for successful transformation. Multiple stable transgenic lines producing the candidate protein in DA neurons are obtained and then independently crossed into the alpha-synuclein degenerative strain and assessed for neurodegeneration, at both the animal and individual neuron level, over the course of aging.
Neuroscience, Issue 17, C. elegans, Parkinson's disease, neuroprotection, alpha-synuclein, Translational Research
Establishment of Microbial Eukaryotic Enrichment Cultures from a Chemically Stratified Antarctic Lake and Assessment of Carbon Fixation Potential
Institutions: Miami University .
Lake Bonney is one of numerous permanently ice-covered lakes located in the McMurdo Dry Valleys, Antarctica. The perennial ice cover maintains a chemically stratified water column and unlike other inland bodies of water, largely prevents external input of carbon and nutrients from streams. Biota are exposed to numerous environmental stresses, including year-round severe nutrient deficiency, low temperatures, extreme shade, hypersalinity, and 24-hour darkness during the winter 1
. These extreme environmental conditions limit the biota in Lake Bonney almost exclusively to microorganisms 2
Single-celled microbial eukaryotes (called "protists") are important players in global biogeochemical cycling 3
and play important ecological roles in the cycling of carbon in the dry valley lakes, occupying both primary and tertiary roles in the aquatic food web. In the dry valley aquatic food web, protists that fix inorganic carbon (autotrophy) are the major producers of organic carbon for organotrophic organisms 4, 2
. Phagotrophic or heterotrophic protists capable of ingesting bacteria and smaller protists act as the top predators in the food web 5
. Last, an unknown proportion of the protist population is capable of combined mixotrophic metabolism 6, 7
. Mixotrophy in protists involves the ability to combine photosynthetic capability with phagotrophic ingestion of prey microorganisms. This form of mixotrophy differs from mixotrophic metabolism in bacterial species, which generally involves uptake dissolved carbon molecules. There are currently very few protist isolates from permanently ice-capped polar lakes, and studies of protist diversity and ecology in this extreme environment have been limited 8, 4, 9, 10, 5
. A better understanding of protist metabolic versatility in the simple dry valley lake food web will aid in the development of models for the role of protists in the global carbon cycle.
We employed an enrichment culture approach to isolate potentially phototrophic and mixotrophic protists from Lake Bonney. Sampling depths in the water column were chosen based on the location of primary production maxima and protist phylogenetic diversity 4, 11
, as well as variability in major abiotic factors affecting protist trophic modes: shallow sampling depths are limited for major nutrients, while deeper sampling depths are limited by light availability. In addition, lake water samples were supplemented with multiple types of growth media to promote the growth of a variety of phototrophic organisms.
RubisCO catalyzes the rate limiting step in the Calvin Benson Bassham (CBB) cycle, the major pathway by which autotrophic organisms fix inorganic carbon and provide organic carbon for higher trophic levels in aquatic and terrestrial food webs 12
. In this study, we applied a radioisotope assay modified for filtered samples 13
to monitor maximum carboxylase activity as a proxy for carbon fixation potential and metabolic versatility in the Lake Bonney enrichment cultures.
Microbiology, Issue 62, Antarctic lake, McMurdo Dry Valleys, Enrichment cultivation, Microbial eukaryotes, RubisCO
Identification of Sleeping Beauty Transposon Insertions in Solid Tumors using Linker-mediated PCR
Institutions: University of Minnesota, Minneapolis, University of Minnesota, Minneapolis.
Genomic, proteomic, transcriptomic, and epigenomic analyses of human tumors indicate that there are thousands of anomalies within each cancer genome compared to matched normal tissue. Based on these analyses it is evident that there are many undiscovered genetic drivers of cancer1
. Unfortunately these drivers are hidden within a much larger number of passenger anomalies in the genome that do not directly contribute to tumor formation. Another aspect of the cancer genome is that there is considerable genetic heterogeneity within similar tumor types. Each tumor can harbor different mutations that provide a selective advantage for tumor formation2
. Performing an unbiased forward genetic screen in mice provides the tools to generate tumors and analyze their genetic composition, while reducing the background of passenger mutations. The Sleeping Beauty
(SB) transposon system is one such method3
. The SB system utilizes mobile vectors (transposons) that can be inserted throughout the genome by the transposase enzyme. Mutations are limited to a specific cell type through the use of a conditional transposase allele that is activated by Cre Recombinase
. Many mouse lines exist that express Cre Recombinase
in specific tissues. By crossing one of these lines to the conditional transposase allele (e.g.
Lox-stop-Lox-SB11), the SB system is activated only in cells that express Cre Recombinase
. The Cre Recombinase
will excise a stop cassette that blocks expression of the transposase allele, thereby activating transposon mutagenesis within the designated cell type. An SB screen is initiated by breeding three strains of transgenic mice so that the experimental mice carry a conditional transposase allele, a concatamer of transposons, and a tissue-specific Cre Recombinase
allele. These mice are allowed to age until tumors form and they become moribund. The mice are then necropsied and genomic DNA is isolated from the tumors. Next, the genomic DNA is subjected to linker-mediated-PCR (LM-PCR) that results in amplification of genomic loci containing an SB transposon. LM-PCR performed on a single tumor will result in hundreds of distinct amplicons representing the hundreds of genomic loci containing transposon insertions in a single tumor4
. The transposon insertions in all tumors are analyzed and common insertion sites (CISs) are identified using an appropriate statistical method5
. Genes within the CIS are highly likely to be oncogenes or tumor suppressor genes, and are considered candidate cancer genes. The advantages of using the SB system to identify candidate cancer genes are: 1) the transposon can easily be located in the genome because its sequence is known, 2) transposition can be directed to almost any cell type and 3) the transposon is capable of introducing both gain- and loss-of-function mutations6
. The following protocol describes how to devise and execute a forward genetic screen using the SB transposon system to identify candidate cancer genes (Figure 1
Genetics, Issue 72, Medicine, Cancer Biology, Biomedical Engineering, Genomics, Mice, Genetic Techniques, life sciences, animal models, Neoplasms, Genetic Phenomena, Forward genetic screen, cancer drivers, mouse models, oncogenes, tumor suppressor genes, Sleeping Beauty transposons, insertions, DNA, PCR, animal model
An Allelotyping PCR for Identifying Salmonella enterica serovars Enteritidis, Hadar, Heidelberg, and Typhimurium
Institutions: University of Georgia.
Current commercial PCRs tests for identifying Salmonella
target genes unique to this genus. However, there are two species, six subspecies, and over 2,500 different Salmonella
serovars, and not all are equal in their significance to public health. For example, finding S. enterica subspecies
IIIa Arizona on a table egg layer farm is insignificant compared to the isolation of S. enterica
subspecies I serovar Enteritidis, the leading cause of salmonellosis linked to the consumption of table eggs. Serovars are identified based on antigenic differences in lipopolysaccharide (LPS)(O antigen) and flagellin (H1 and H2 antigens). These antigenic differences are the outward appearance of the diversity of genes and gene alleles associated with this phenotype.
We have developed an allelotyping, multiplex PCR that keys on genetic differences between four major S. enterica
subspecies I serovars found in poultry and associated with significant human disease in the US. The PCR primer pairs were targeted to key genes or sequences unique to a specific Salmonella
serovar and designed to produce an amplicon with size specific for that gene or allele. Salmonella
serovar is assigned to an isolate based on the combination of PCR test results for specific LPS and flagellin gene alleles. The multiplex PCRs described in this article are specific for the detection of S. enterica
subspecies I serovars Enteritidis, Hadar, Heidelberg, and Typhimurium.
Here we demonstrate how to use the multiplex PCRs to identify serovar for a Salmonella
Immunology, Issue 53, PCR, Salmonella, multiplex, Serovar
Single-plant, Sterile Microcosms for Nodulation and Growth of the Legume Plant Medicago truncatula with the Rhizobial Symbiont Sinorhizobium meliloti
Institutions: Florida State University.
Rhizobial bacteria form symbiotic, nitrogen-fixing nodules on the roots of compatible host legume plants. One of the most well-developed model systems for studying these interactions is the plant Medicago truncatula
cv. Jemalong A17 and the rhizobial bacterium Sinorhizobium meliloti
1021. Repeated imaging of plant roots and scoring of symbiotic phenotypes requires methods that are non-destructive to either plants or bacteria. The symbiotic phenotypes of some plant and bacterial mutants become apparent after relatively short periods of growth, and do not require long-term observation of the host/symbiont interaction. However, subtle differences in symbiotic efficiency and nodule senescence phenotypes that are not apparent in the early stages of the nodulation process require relatively long growth periods before they can be scored. Several methods have been developed for long-term growth and observation of this host/symbiont pair. However, many of these methods require repeated watering, which increases the possibility of contamination by other microbes. Other methods require a relatively large space for growth of large numbers of plants. The method described here, symbiotic growth of M. truncatula/S. meliloti
in sterile, single-plant microcosms, has several advantages. Plants in these microcosms have sufficient moisture and nutrients to ensure that watering is not required for up to 9 weeks, preventing cross-contamination during watering. This allows phenotypes to be quantified that might be missed in short-term growth systems, such as subtle delays in nodule development and early nodule senescence. Also, the roots and nodules in the microcosm are easily viewed through the plate lid, so up-rooting of the plants for observation is not required.
Environmental Sciences, Issue 80, Plant Roots, Medicago, Gram-Negative Bacteria, Nitrogen, Microbiological Techniques, Bacterial Processes, Symbiosis, botany, microbiology, Medicago truncatula, Sinorhizobium meliloti, nodule, nitrogen fixation, legume, rhizobia, bacteria
Extracellularly Identifying Motor Neurons for a Muscle Motor Pool in Aplysia californica
Institutions: Case Western Reserve University , Case Western Reserve University , Case Western Reserve University .
In animals with large identified neurons (e.g.
mollusks), analysis of motor pools is done using intracellular techniques1,2,3,4
. Recently, we developed a technique to extracellularly stimulate and record individual neurons in Aplysia californica5
. We now describe a protocol for using this technique to uniquely identify and characterize motor neurons within a motor pool.
This extracellular technique has advantages. First, extracellular electrodes can stimulate and record neurons through the sheath5
, so it does not need to be removed. Thus, neurons will be healthier in extracellular experiments than in intracellular ones. Second, if ganglia are rotated by appropriate pinning of the sheath, extracellular electrodes can access neurons on both sides of the ganglion, which makes it easier and more efficient to identify multiple neurons in the same preparation. Third, extracellular electrodes do not need to penetrate cells, and thus can be easily moved back and forth among neurons, causing less damage to them. This is especially useful when one tries to record multiple neurons during repeating motor patterns that may only persist for minutes. Fourth, extracellular electrodes are more flexible than intracellular ones during muscle movements. Intracellular electrodes may pull out and damage neurons during muscle contractions. In contrast, since extracellular electrodes are gently pressed onto the sheath above neurons, they usually stay above the same neuron during muscle contractions, and thus can be used in more intact preparations.
To uniquely identify motor neurons for a motor pool (in particular, the I1/I3 muscle in Aplysia
) using extracellular electrodes, one can use features that do not require intracellular measurements as criteria: soma size and location, axonal projection, and muscle innervation4,6,7
. For the particular motor pool used to illustrate the technique, we recorded from buccal nerves 2 and 3 to measure axonal projections, and measured the contraction forces of the I1/I3 muscle to determine the pattern of muscle innervation for the individual motor neurons.
We demonstrate the complete process of first identifying motor neurons using muscle innervation, then characterizing their timing during motor patterns, creating a simplified diagnostic method for rapid identification. The simplified and more rapid diagnostic method is superior for more intact preparations, e.g.
in the suspended buccal mass preparation8
or in vivo9
. This process can also be applied in other motor pools10,11,12
or in other animal systems2,3,13,14
Neuroscience, Issue 73, Physiology, Biomedical Engineering, Anatomy, Behavior, Neurobiology, Animal, Neurosciences, Neurophysiology, Electrophysiology, Aplysia, Aplysia californica, California sea slug, invertebrate, feeding, buccal mass, ganglia, motor neurons, neurons, extracellular stimulation and recordings, extracellular electrodes, animal model
In vivo and In vitro Rearing of Entomopathogenic Nematodes (Steinernematidae and Heterorhabditidae)
Institutions: University of Arizona, University of Arizona.
Entomopathogenic nematodes (EPN) (Steinernematidae
) have a mutualistic partnership with Gram-negative Gamma-Proteobacteria in the family Enterobacteriaceae. Xenorhabdus
bacteria are associated with steinernematids nematodes while Photorhabdus
are symbionts of heterorhabditids. Together nematodes and bacteria form a potent insecticidal complex that kills a wide range of insect species in an intimate and specific partnership. Herein, we demonstrate in vivo
and in vitro
techniques commonly used in the rearing of these nematodes under laboratory conditions. Furthermore, these techniques represent key steps for the successful establishment of EPN cultures and also form the basis for other bioassays that utilize these organisms for research. The production of aposymbiotic (symbiont–free) nematodes is often critical for an in-depth and multifaceted approach to the study of symbiosis. This protocol does not require the addition of antibiotics and can be accomplished in a short amount of time with standard laboratory equipment. Nematodes produced in this manner are relatively robust, although their survivorship in storage may vary depending on the species used. The techniques detailed in this presentation correspond to those described by various authors and refined by P. Stock’s Laboratory, University of Arizona (Tucson, AZ, USA). These techniques are distinct from the body of techniques that are used in the mass production of these organisms for pest management purposes.
Bioengineering, Issue 91, entomology, nematology, microbiology, entomopathogenic, nematodes, bacteria, rearing, in vivo, in vitro
Colonization of Euprymna scolopes Squid by Vibrio fischeri
Institutions: Northwestern University.
Specific bacteria are found in association with animal tissue1-5
. Such host-bacterial associations (symbioses) can be detrimental (pathogenic), have no fitness consequence (commensal), or be beneficial (mutualistic). While much attention has been given to pathogenic interactions, little is known about the processes that dictate the reproducible acquisition of beneficial/commensal bacteria from the environment. The light-organ mutualism between the marine Gram-negative bacterium V. fischeri
and the Hawaiian bobtail squid, E. scolopes
, represents a highly specific interaction in which one host (E. scolopes
) establishes a symbiotic relationship with only one bacterial species (V. fischeri
) throughout the course of its lifetime6,7
. Bioluminescence produced by V. fischeri
during this interaction provides an anti-predatory benefit to E. scolopes
during nocturnal activities8,9
, while the nutrient-rich host tissue provides V. fischeri
with a protected niche10
. During each host generation, this relationship is recapitulated, thus representing a predictable process that can be assessed in detail at various stages of symbiotic development. In the laboratory, the juvenile squid hatch aposymbiotically (uncolonized), and, if collected within the first 30-60 minutes and transferred to symbiont-free water, cannot be colonized except by the experimental inoculum6
. This interaction thus provides a useful model system in which to assess the individual steps that lead to specific acquisition of a symbiotic microbe from the environment11,12
Here we describe a method to assess the degree of colonization that occurs when newly hatched aposymbiotic E. scolopes
are exposed to (artificial) seawater containing V. fischeri.
This simple assay describes inoculation, natural infection, and recovery of the bacterial symbiont from the nascent light organ of E. scolopes.
Care is taken to provide a consistent environment for the animals during symbiotic development, especially with regard to water quality and light cues. Methods to characterize the symbiotic population described include (1) measurement of bacterially-derived bioluminescence, and (2) direct colony counting of recovered symbionts.
Immunology, Issue 61, Symbiosis, mutualism, specificity, Euprymna scolopes, Vibrio fischeri, colonization, light organ, marine microbiology
Investigating Protein-protein Interactions in Live Cells Using Bioluminescence Resonance Energy Transfer
Institutions: Max Planck Institute for Psycholinguistics, Donders Institute for Brain, Cognition and Behaviour.
Assays based on Bioluminescence Resonance Energy Transfer (BRET) provide a sensitive and reliable means to monitor protein-protein interactions in live cells. BRET is the non-radiative transfer of energy from a 'donor' luciferase enzyme to an 'acceptor' fluorescent protein. In the most common configuration of this assay, the donor is Renilla reniformis
luciferase and the acceptor is Yellow Fluorescent Protein (YFP). Because the efficiency of energy transfer is strongly distance-dependent, observation of the BRET phenomenon requires that the donor and acceptor be in close proximity. To test for an interaction between two proteins of interest in cultured mammalian cells, one protein is expressed as a fusion with luciferase and the second as a fusion with YFP. An interaction between the two proteins of interest may bring the donor and acceptor sufficiently close for energy transfer to occur. Compared to other techniques for investigating protein-protein interactions, the BRET assay is sensitive, requires little hands-on time and few reagents, and is able to detect interactions which are weak, transient, or dependent on the biochemical environment found within a live cell. It is therefore an ideal approach for confirming putative interactions suggested by yeast two-hybrid or mass spectrometry proteomics studies, and in addition it is well-suited for mapping interacting regions, assessing the effect of post-translational modifications on protein-protein interactions, and evaluating the impact of mutations identified in patient DNA.
Cellular Biology, Issue 87, Protein-protein interactions, Bioluminescence Resonance Energy Transfer, Live cell, Transfection, Luciferase, Yellow Fluorescent Protein, Mutations
Determination of Protein-ligand Interactions Using Differential Scanning Fluorimetry
Institutions: University of Exeter.
A wide range of methods are currently available for determining the dissociation constant between a protein and interacting small molecules. However, most of these require access to specialist equipment, and often require a degree of expertise to effectively establish reliable experiments and analyze data. Differential scanning fluorimetry (DSF) is being increasingly used as a robust method for initial screening of proteins for interacting small molecules, either for identifying physiological partners or for hit discovery. This technique has the advantage that it requires only a PCR machine suitable for quantitative PCR, and so suitable instrumentation is available in most institutions; an excellent range of protocols are already available; and there are strong precedents in the literature for multiple uses of the method. Past work has proposed several means of calculating dissociation constants from DSF data, but these are mathematically demanding. Here, we demonstrate a method for estimating dissociation constants from a moderate amount of DSF experimental data. These data can typically be collected and analyzed within a single day. We demonstrate how different models can be used to fit data collected from simple binding events, and where cooperative binding or independent binding sites are present. Finally, we present an example of data analysis in a case where standard models do not apply. These methods are illustrated with data collected on commercially available control proteins, and two proteins from our research program. Overall, our method provides a straightforward way for researchers to rapidly gain further insight into protein-ligand interactions using DSF.
Biophysics, Issue 91, differential scanning fluorimetry, dissociation constant, protein-ligand interactions, StepOne, cooperativity, WcbI.
Annotation of Plant Gene Function via Combined Genomics, Metabolomics and Informatics
Given the ever expanding number of model plant species for which complete genome sequences are available and the abundance of bio-resources such as knockout mutants, wild accessions and advanced breeding populations, there is a rising burden for gene functional annotation. In this protocol, annotation of plant gene function using combined co-expression gene analysis, metabolomics and informatics is provided (Figure 1
). This approach is based on the theory of using target genes of known function to allow the identification of non-annotated genes likely to be involved in a certain metabolic process, with the identification of target compounds via metabolomics. Strategies are put forward for applying this information on populations generated by both forward and reverse genetics approaches in spite of none of these are effortless. By corollary this approach can also be used as an approach to characterise unknown peaks representing new or specific secondary metabolites in the limited tissues, plant species or stress treatment, which is currently the important trial to understanding plant metabolism.
Plant Biology, Issue 64, Genetics, Bioinformatics, Metabolomics, Plant metabolism, Transcriptome analysis, Functional annotation, Computational biology, Plant biology, Theoretical biology, Spectroscopy and structural analysis
From Voxels to Knowledge: A Practical Guide to the Segmentation of Complex Electron Microscopy 3D-Data
Institutions: Lawrence Berkeley National Laboratory, Lawrence Berkeley National Laboratory, Lawrence Berkeley National Laboratory.
Modern 3D electron microscopy approaches have recently allowed unprecedented insight into the 3D ultrastructural organization of cells and tissues, enabling the visualization of large macromolecular machines, such as adhesion complexes, as well as higher-order structures, such as the cytoskeleton and cellular organelles in their respective cell and tissue context. Given the inherent complexity of cellular volumes, it is essential to first extract the features of interest in order to allow visualization, quantification, and therefore comprehension of their 3D organization. Each data set is defined by distinct characteristics, e.g.
, signal-to-noise ratio, crispness (sharpness) of the data, heterogeneity of its features, crowdedness of features, presence or absence of characteristic shapes that allow for easy identification, and the percentage of the entire volume that a specific region of interest occupies. All these characteristics need to be considered when deciding on which approach to take for segmentation.
The six different 3D ultrastructural data sets presented were obtained by three different imaging approaches: resin embedded stained electron tomography, focused ion beam- and serial block face- scanning electron microscopy (FIB-SEM, SBF-SEM) of mildly stained and heavily stained samples, respectively. For these data sets, four different segmentation approaches have been applied: (1) fully manual model building followed solely by visualization of the model, (2) manual tracing segmentation of the data followed by surface rendering, (3) semi-automated approaches followed by surface rendering, or (4) automated custom-designed segmentation algorithms followed by surface rendering and quantitative analysis. Depending on the combination of data set characteristics, it was found that typically one of these four categorical approaches outperforms the others, but depending on the exact sequence of criteria, more than one approach may be successful. Based on these data, we propose a triage scheme that categorizes both objective data set characteristics and subjective personal criteria for the analysis of the different data sets.
Bioengineering, Issue 90, 3D electron microscopy, feature extraction, segmentation, image analysis, reconstruction, manual tracing, thresholding
Microarray-based Identification of Individual HERV Loci Expression: Application to Biomarker Discovery in Prostate Cancer
Institutions: Joint Unit Hospices de Lyon-bioMérieux, BioMérieux, Hospices Civils de Lyon, Lyon 1 University, BioMérieux, Hospices Civils de Lyon, Hospices Civils de Lyon.
The prostate-specific antigen (PSA) is the main diagnostic biomarker for prostate cancer in clinical use, but it lacks specificity and sensitivity, particularly in low dosage values1
. ‘How to use PSA' remains a current issue, either for diagnosis as a gray zone corresponding to a concentration in serum of 2.5-10 ng/ml which does not allow a clear differentiation to be made between cancer and noncancer2
or for patient follow-up as analysis of post-operative PSA kinetic parameters can pose considerable challenges for their practical application3,4
. Alternatively, noncoding RNAs (ncRNAs) are emerging as key molecules in human cancer, with the potential to serve as novel markers of disease, e.g.
PCA3 in prostate cancer5,6
and to reveal uncharacterized aspects of tumor biology. Moreover, data from the ENCODE project published in 2012 showed that different RNA types cover about 62% of the genome. It also appears that the amount of transcriptional regulatory motifs is at least 4.5x higher than the one corresponding to protein-coding exons. Thus, long terminal repeats (LTRs) of human endogenous retroviruses (HERVs) constitute a wide range of putative/candidate transcriptional regulatory sequences, as it is their primary function in infectious retroviruses. HERVs, which are spread throughout the human genome, originate from ancestral and independent infections within the germ line, followed by copy-paste propagation processes and leading to multicopy families occupying 8% of the human genome (note that exons span 2% of our genome). Some HERV loci still express proteins that have been associated with several pathologies including cancer7-10
. We have designed a high-density microarray, in Affymetrix format, aiming to optimally characterize individual HERV loci expression, in order to better understand whether they can be active, if they drive ncRNA transcription or modulate coding gene expression. This tool has been applied in the prostate cancer field (Figure 1
Medicine, Issue 81, Cancer Biology, Genetics, Molecular Biology, Prostate, Retroviridae, Biomarkers, Pharmacological, Tumor Markers, Biological, Prostatectomy, Microarray Analysis, Gene Expression, Diagnosis, Human Endogenous Retroviruses, HERV, microarray, Transcriptome, prostate cancer, Affymetrix
Mapping Bacterial Functional Networks and Pathways in Escherichia Coli using Synthetic Genetic Arrays
Institutions: University of Toronto, University of Toronto, University of Regina.
Phenotypes are determined by a complex series of physical (e.g.
protein-protein) and functional (e.g.
gene-gene or genetic) interactions (GI)1
. While physical interactions can indicate which bacterial proteins are associated as complexes, they do not necessarily reveal pathway-level functional relationships1. GI screens, in which the growth of double mutants bearing two deleted or inactivated genes is measured and compared to the corresponding single mutants, can illuminate epistatic dependencies between loci and hence provide a means to query and discover novel functional relationships2
. Large-scale GI maps have been reported for eukaryotic organisms like yeast3-7
, but GI information remains sparse for prokaryotes8
, which hinders the functional annotation of bacterial genomes. To this end, we and others have developed high-throughput quantitative bacterial GI screening methods9, 10
Here, we present the key steps required to perform quantitative E. coli
Synthetic Genetic Array (eSGA) screening procedure on a genome-scale9
, using natural bacterial conjugation and homologous recombination to systemically generate and measure the fitness of large numbers of double mutants in a colony array format.
Briefly, a robot is used to transfer, through conjugation, chloramphenicol (Cm) - marked mutant alleles from engineered Hfr (High frequency of recombination) 'donor strains' into an ordered array of kanamycin (Kan) - marked F- recipient strains. Typically, we use loss-of-function single mutants bearing non-essential gene deletions (e.g.
the 'Keio' collection11
) and essential gene hypomorphic mutations (i.e.
alleles conferring reduced protein expression, stability, or activity9, 12, 13
) to query the functional associations of non-essential and essential genes, respectively. After conjugation and ensuing genetic exchange mediated by homologous recombination, the resulting double mutants are selected on solid medium containing both antibiotics. After outgrowth, the plates are digitally imaged and colony sizes are quantitatively scored using an in-house automated image processing system14
. GIs are revealed when the growth rate of a double mutant is either significantly better or worse than expected9
. Aggravating (or negative) GIs often result between loss-of-function mutations in pairs of genes from compensatory pathways that impinge on the same essential process2
. Here, the loss of a single gene is buffered, such that either single mutant is viable. However, the loss of both pathways is deleterious and results in synthetic lethality or sickness (i.e.
slow growth). Conversely, alleviating (or positive) interactions can occur between genes in the same pathway or protein complex2
as the deletion of either gene alone is often sufficient to perturb the normal function of the pathway or complex such that additional perturbations do not reduce activity, and hence growth, further. Overall, systematically identifying and analyzing GI networks can provide unbiased, global maps of the functional relationships between large numbers of genes, from which pathway-level information missed by other approaches can be inferred9
Genetics, Issue 69, Molecular Biology, Medicine, Biochemistry, Microbiology, Aggravating, alleviating, conjugation, double mutant, Escherichia coli, genetic interaction, Gram-negative bacteria, homologous recombination, network, synthetic lethality or sickness, suppression
An Affordable HIV-1 Drug Resistance Monitoring Method for Resource Limited Settings
Institutions: University of KwaZulu-Natal, Durban, South Africa, Jembi Health Systems, University of Amsterdam, Stanford Medical School.
HIV-1 drug resistance has the potential to seriously compromise the effectiveness and impact of antiretroviral therapy (ART). As ART programs in sub-Saharan Africa continue to expand, individuals on ART should be closely monitored for the emergence of drug resistance. Surveillance of transmitted drug resistance to track transmission of viral strains already resistant to ART is also critical. Unfortunately, drug resistance testing is still not readily accessible in resource limited settings, because genotyping is expensive and requires sophisticated laboratory and data management infrastructure. An open access genotypic drug resistance monitoring method to manage individuals and assess transmitted drug resistance is described. The method uses free open source software for the interpretation of drug resistance patterns and the generation of individual patient reports. The genotyping protocol has an amplification rate of greater than 95% for plasma samples with a viral load >1,000 HIV-1 RNA copies/ml. The sensitivity decreases significantly for viral loads <1,000 HIV-1 RNA copies/ml. The method described here was validated against a method of HIV-1 drug resistance testing approved by the United States Food and Drug Administration (FDA), the Viroseq genotyping method. Limitations of the method described here include the fact that it is not automated and that it also failed to amplify the circulating recombinant form CRF02_AG from a validation panel of samples, although it amplified subtypes A and B from the same panel.
Medicine, Issue 85, Biomedical Technology, HIV-1, HIV Infections, Viremia, Nucleic Acids, genetics, antiretroviral therapy, drug resistance, genotyping, affordable
Using Coculture to Detect Chemically Mediated Interspecies Interactions
Institutions: University of North Carolina at Chapel Hill .
In nature, bacteria rarely exist in isolation; they are instead surrounded by a diverse array of other microorganisms that alter the local environment by secreting metabolites. These metabolites have the potential to modulate the physiology and differentiation of their microbial neighbors and are likely important factors in the establishment and maintenance of complex microbial communities. We have developed a fluorescence-based coculture screen to identify such chemically mediated microbial interactions. The screen involves combining a fluorescent transcriptional reporter strain with environmental microbes on solid media and allowing the colonies to grow in coculture. The fluorescent transcriptional reporter is designed so that the chosen bacterial strain fluoresces when it is expressing a particular phenotype of interest (i.e.
biofilm formation, sporulation, virulence factor production, etc
.) Screening is performed under growth conditions where this phenotype is not
expressed (and therefore the reporter strain is typically nonfluorescent). When an environmental microbe secretes a metabolite that activates this phenotype, it diffuses through the agar and activates the fluorescent reporter construct. This allows the inducing-metabolite-producing microbe to be detected: they are the nonfluorescent colonies most proximal to the fluorescent colonies. Thus, this screen allows the identification of environmental microbes that produce diffusible metabolites that activate a particular physiological response in a reporter strain. This publication discusses how to: a) select appropriate coculture screening conditions, b) prepare the reporter and environmental microbes for screening, c) perform the coculture screen, d) isolate putative inducing organisms, and e) confirm their activity in a secondary screen. We developed this method to screen for soil organisms that activate biofilm matrix-production in Bacillus subtilis
; however, we also discuss considerations for applying this approach to other genetically tractable bacteria.
Microbiology, Issue 80, High-Throughput Screening Assays, Genes, Reporter, Microbial Interactions, Soil Microbiology, Coculture, microbial interactions, screen, fluorescent transcriptional reporters, Bacillus subtilis
A Restriction Enzyme Based Cloning Method to Assess the In vitro Replication Capacity of HIV-1 Subtype C Gag-MJ4 Chimeric Viruses
Institutions: Emory University, Emory University.
The protective effect of many HLA class I alleles on HIV-1 pathogenesis and disease progression is, in part, attributed to their ability to target conserved portions of the HIV-1 genome that escape with difficulty. Sequence changes attributed to cellular immune pressure arise across the genome during infection, and if found within conserved regions of the genome such as Gag, can affect the ability of the virus to replicate in vitro
. Transmission of HLA-linked polymorphisms in Gag to HLA-mismatched recipients has been associated with reduced set point viral loads. We hypothesized this may be due to a reduced replication capacity of the virus. Here we present a novel method for assessing the in vitro
replication of HIV-1 as influenced by the gag
gene isolated from acute time points from subtype C infected Zambians. This method uses restriction enzyme based cloning to insert the gag
gene into a common subtype C HIV-1 proviral backbone, MJ4. This makes it more appropriate to the study of subtype C sequences than previous recombination based methods that have assessed the in vitro
replication of chronically derived gag-pro
sequences. Nevertheless, the protocol could be readily modified for studies of viruses from other subtypes. Moreover, this protocol details a robust and reproducible method for assessing the replication capacity of the Gag-MJ4 chimeric viruses on a CEM-based T cell line. This method was utilized for the study of Gag-MJ4 chimeric viruses derived from 149 subtype C acutely infected Zambians, and has allowed for the identification of residues in Gag that affect replication. More importantly, the implementation of this technique has facilitated a deeper understanding of how viral replication defines parameters of early HIV-1 pathogenesis such as set point viral load and longitudinal CD4+ T cell decline.
Infectious Diseases, Issue 90, HIV-1, Gag, viral replication, replication capacity, viral fitness, MJ4, CEM, GXR25
Isolation of Fidelity Variants of RNA Viruses and Characterization of Virus Mutation Frequency
Institutions: Institut Pasteur .
RNA viruses use RNA dependent RNA polymerases to replicate their genomes. The intrinsically high error rate of these enzymes is a large contributor to the generation of extreme population diversity that facilitates virus adaptation and evolution. Increasing evidence shows that the intrinsic error rates, and the resulting mutation frequencies, of RNA viruses can be modulated by subtle amino acid changes to the viral polymerase. Although biochemical assays exist for some viral RNA polymerases that permit quantitative measure of incorporation fidelity, here we describe a simple method of measuring mutation frequencies of RNA viruses that has proven to be as accurate as biochemical approaches in identifying fidelity altering mutations. The approach uses conventional virological and sequencing techniques that can be performed in most biology laboratories. Based on our experience with a number of different viruses, we have identified the key steps that must be optimized to increase the likelihood of isolating fidelity variants and generating data of statistical significance. The isolation and characterization of fidelity altering mutations can provide new insights into polymerase structure and function1-3
. Furthermore, these fidelity variants can be useful tools in characterizing mechanisms of virus adaptation and evolution4-7
Immunology, Issue 52, Polymerase fidelity, RNA virus, mutation frequency, mutagen, RNA polymerase, viral evolution
A Hybrid DNA Extraction Method for the Qualitative and Quantitative Assessment of Bacterial Communities from Poultry Production Samples
Institutions: USDA-Agricultural Research Service, USDA-Agricultural Research Service, Oregon State University, University of Georgia, Northern Arizona University.
The efficacy of DNA extraction protocols can be highly dependent upon both the type of sample being investigated and the types of downstream analyses performed. Considering that the use of new bacterial community analysis techniques (e.g.,
microbiomics, metagenomics) is becoming more prevalent in the agricultural and environmental sciences and many environmental samples within these disciplines can be physiochemically and microbiologically unique (e.g.,
fecal and litter/bedding samples from the poultry production spectrum), appropriate and effective DNA extraction methods need to be carefully chosen. Therefore, a novel semi-automated hybrid DNA extraction method was developed specifically for use with environmental poultry production samples. This method is a combination of the two major types of DNA extraction: mechanical and enzymatic. A two-step intense mechanical homogenization step (using bead-beating specifically formulated for environmental samples) was added to the beginning of the “gold standard” enzymatic DNA extraction method for fecal samples to enhance the removal of bacteria and DNA from the sample matrix and improve the recovery of Gram-positive bacterial community members. Once the enzymatic extraction portion of the hybrid method was initiated, the remaining purification process was automated using a robotic workstation to increase sample throughput and decrease sample processing error. In comparison to the strict mechanical and enzymatic DNA extraction methods, this novel hybrid method provided the best overall combined performance when considering quantitative (using 16S rRNA qPCR) and qualitative (using microbiomics) estimates of the total bacterial communities when processing poultry feces and litter samples.
Molecular Biology, Issue 94, DNA extraction, poultry, environmental, feces, litter, semi-automated, microbiomics, qPCR
Multimodal Optical Microscopy Methods Reveal Polyp Tissue Morphology and Structure in Caribbean Reef Building Corals
Institutions: University of Illinois at Urbana-Champaign, University of Illinois at Urbana-Champaign, University of Illinois at Urbana-Champaign.
An integrated suite of imaging techniques has been applied to determine the three-dimensional (3D) morphology and cellular structure of polyp tissues comprising the Caribbean reef building corals Montastraeaannularis
and M. faveolata
. These approaches include fluorescence microscopy (FM), serial block face imaging (SBFI), and two-photon confocal laser scanning microscopy (TPLSM). SBFI provides deep tissue imaging after physical sectioning; it details the tissue surface texture and 3D visualization to tissue depths of more than 2 mm. Complementary FM and TPLSM yield ultra-high resolution images of tissue cellular structure. Results have: (1) identified previously unreported lobate tissue morphologies on the outer wall of individual coral polyps and (2) created the first surface maps of the 3D distribution and tissue density of chromatophores and algae-like dinoflagellate zooxanthellae
endosymbionts. Spectral absorption peaks of 500 nm and 675 nm, respectively, suggest that M. annularis
and M. faveolata
contain similar types of chlorophyll and chromatophores. However, M. annularis
and M. faveolata
exhibit significant differences in the tissue density and 3D distribution of these key cellular components. This study focusing on imaging methods indicates that SBFI is extremely useful for analysis of large mm-scale samples of decalcified coral tissues. Complimentary FM and TPLSM reveal subtle submillimeter scale changes in cellular distribution and density in nondecalcified coral tissue samples. The TPLSM technique affords: (1) minimally invasive sample preparation, (2) superior optical sectioning ability, and (3) minimal light absorption and scattering, while still permitting deep tissue imaging.
Environmental Sciences, Issue 91, Serial block face imaging, two-photon fluorescence microscopy, Montastraea annularis, Montastraea faveolata, 3D coral tissue morphology and structure, zooxanthellae, chromatophore, autofluorescence, light harvesting optimization, environmental change
The ITS2 Database
Institutions: University of Würzburg, University of Würzburg.
The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution1
and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation2-8
The ITS2 Database9
presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank11
. Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold12
(direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling13
. In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold.
The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST14
search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE15,16
for multiple sequence-structure alignment calculation and Neighbor Joining18
tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure.
In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses.
Genetics, Issue 61, alignment, internal transcribed spacer 2, molecular systematics, secondary structure, ribosomal RNA, phylogenetic tree, homology modeling, phylogeny
Unraveling the Unseen Players in the Ocean - A Field Guide to Water Chemistry and Marine Microbiology
Institutions: San Diego State University, University of California San Diego.
Here we introduce a series of thoroughly tested and well standardized research protocols adapted for use in remote marine environments. The sampling protocols include the assessment of resources available to the microbial community (dissolved organic carbon, particulate organic matter, inorganic nutrients), and a comprehensive description of the viral and bacterial communities (via direct viral and microbial counts, enumeration of autofluorescent microbes, and construction of viral and microbial metagenomes). We use a combination of methods, which represent a dispersed field of scientific disciplines comprising already established protocols and some of the most recent techniques developed. Especially metagenomic sequencing techniques used for viral and bacterial community characterization, have been established only in recent years, and are thus still subjected to constant improvement. This has led to a variety of sampling and sample processing procedures currently in use. The set of methods presented here provides an up to date approach to collect and process environmental samples. Parameters addressed with these protocols yield the minimum on information essential to characterize and understand the underlying mechanisms of viral and microbial community dynamics. It gives easy to follow guidelines to conduct comprehensive surveys and discusses critical steps and potential caveats pertinent to each technique.
Environmental Sciences, Issue 93, dissolved organic carbon, particulate organic matter, nutrients, DAPI, SYBR, microbial metagenomics, viral metagenomics, marine environment
Building a Better Mosquito: Identifying the Genes Enabling Malaria and Dengue Fever Resistance in A. gambiae and A. aegypti Mosquitoes
Institutions: Johns Hopkins University.
In this interview, George Dimopoulos focuses on the physiological mechanisms used by mosquitoes to combat Plasmodium falciparum and dengue virus infections. Explanation is given for how key refractory genes, those genes conferring resistance to vector pathogens, are identified in the mosquito and how this knowledge can be used to generate transgenic mosquitoes that are unable to carry the malaria parasite or dengue virus.
Cellular Biology, Issue 5, Translational Research, mosquito, malaria, virus, dengue, genetics, injection, RNAi, transgenesis, transgenic
A Strategy to Identify de Novo Mutations in Common Disorders such as Autism and Schizophrenia
Institutions: Universite de Montreal, Universite de Montreal, Universite de Montreal.
There are several lines of evidence supporting the role of de novo
mutations as a mechanism for common disorders, such as autism and schizophrenia. First, the de novo
mutation rate in humans is relatively high, so new mutations are generated at a high frequency in the population. However, de novo
mutations have not been reported in most common diseases. Mutations in genes leading to severe diseases where there is a strong negative selection against the phenotype, such as lethality in embryonic stages or reduced reproductive fitness, will not be transmitted to multiple family members, and therefore will not be detected by linkage gene mapping or association studies. The observation of very high concordance in monozygotic twins and very low concordance in dizygotic twins also strongly supports the hypothesis that a significant fraction of cases may result from new mutations. Such is the case for diseases such as autism and schizophrenia. Second, despite reduced reproductive fitness1
and extremely variable environmental factors, the incidence of some diseases is maintained worldwide at a relatively high and constant rate. This is the case for autism and schizophrenia, with an incidence of approximately 1% worldwide. Mutational load can be thought of as a balance between selection for or against a deleterious mutation and its production by de novo
mutation. Lower rates of reproduction constitute a negative selection factor that should reduce the number of mutant alleles in the population, ultimately leading to decreased disease prevalence. These selective pressures tend to be of different intensity in different environments. Nonetheless, these severe mental disorders have been maintained at a constant relatively high prevalence in the worldwide population across a wide range of cultures and countries despite a strong negative selection against them2
. This is not what one would predict in diseases with reduced reproductive fitness, unless there was a high new mutation rate. Finally, the effects of paternal age: there is a significantly increased risk of the disease with increasing paternal age, which could result from the age related increase in paternal de novo
mutations. This is the case for autism and schizophrenia3
. The male-to-female ratio of mutation rate is estimated at about 4–6:1, presumably due to a higher number of germ-cell divisions with age in males. Therefore, one would predict that de novo
mutations would more frequently come from males, particularly older males4
. A high rate of new mutations may in part explain why genetic studies have so far failed to identify many genes predisposing to complexes diseases genes, such as autism and schizophrenia, and why diseases have been identified for a mere 3% of genes in the human genome. Identification for de novo
mutations as a cause of a disease requires a targeted molecular approach, which includes studying parents and affected subjects. The process for determining if the genetic basis of a disease may result in part from de novo
mutations and the molecular approach to establish this link will be illustrated, using autism and schizophrenia as examples.
Medicine, Issue 52, de novo mutation, complex diseases, schizophrenia, autism, rare variations, DNA sequencing
Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes
Institutions: Dartmouth College.
SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference1
. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data1
. In this article, we utilize a web version of SCOPE2
to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs3,4
and has been used in other studies5-8
The three algorithms that comprise SCOPE are BEAM9
, which finds non-degenerate motifs (ACCGGT), PRISM10
, which finds degenerate motifs (ASCGWT), and SPACER11
, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well.
Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor.
Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run.
Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and "strand" indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail1,2,9-11
Genetics, Issue 51, gene regulation, computational biology, algorithm, promoter sequence motif