Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server.
26 Related JoVE Articles!
Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
Institutions: University of Toledo Health Science Campus.
Non-coding genomic regions in complex eukaryotes, including intergenic areas, introns, and untranslated segments of exons, are profoundly non-random in their nucleotide composition and consist of a complex mosaic of sequence patterns. These patterns include so-called Mid-Range Inhomogeneity (MRI) regions -- sequences 30-10000 nucleotides in length that are enriched by a particular base or combination of bases (e.g. (G+T)-rich, purine-rich, etc.). MRI regions are associated with unusual (non-B-form) DNA structures that are often involved in regulation of gene expression, recombination, and other genetic processes (Fedorova & Fedorov 2010). The existence of a strong fixation bias within MRI regions against mutations that tend to reduce their sequence inhomogeneity additionally supports the functionality and importance of these genomic sequences (Prakash et al.
Here we demonstrate a freely available Internet resource -- the Genomic MRI
program package -- designed for computational analysis of genomic sequences in order to find and characterize various MRI patterns within them (Bechtel et al.
2008). This package also allows generation of randomized sequences with various properties and level of correspondence to the natural input DNA sequences. The main goal of this resource is to facilitate examination of vast regions of non-coding DNA that are still scarcely investigated and await thorough exploration and recognition.
Genetics, Issue 51, bioinformatics, computational biology, genomics, non-randomness, signals, gene regulation, DNA conformation
Pyrosequencing for Microbial Identification and Characterization
Institutions: Johns Hopkins University, Qiagen Sciences, Inc..
Pyrosequencing is a versatile technique that facilitates microbial genome sequencing that can be used to identify bacterial species, discriminate bacterial strains and detect genetic mutations that confer resistance to anti-microbial agents. The advantages of pyrosequencing for microbiology applications include rapid and reliable high-throughput screening and accurate identification of microbes and microbial genome mutations. Pyrosequencing involves sequencing of DNA by synthesizing the complementary strand a single base at a time, while determining the specific nucleotide being incorporated during the synthesis reaction. The reaction occurs on immobilized single stranded template DNA where the four deoxyribonucleotides (dNTP) are added sequentially and the unincorporated dNTPs are enzymatically degraded before addition of the next dNTP to the synthesis reaction. Detection of the specific base incorporated into the template is monitored by generation of chemiluminescent signals. The order of dNTPs that produce the chemiluminescent signals determines the DNA sequence of the template. The real-time sequencing capability of pyrosequencing technology enables rapid microbial identification in a single assay. In addition, the pyrosequencing instrument, can analyze the full genetic diversity of anti-microbial drug resistance, including typing of SNPs, point mutations, insertions, and deletions, as well as quantification of multiple gene copies that may occur in some anti-microbial resistance patterns.
Microbiology, Issue 78, Genetics, Molecular Biology, Basic Protocols, Genomics, Eukaryota, Bacteria, Viruses, Bacterial Infections and Mycoses, Virus Diseases, Diagnosis, Therapeutics, Equipment and Supplies, Technology, Industry, and Agriculture, Life Sciences (General), Pyrosequencing, DNA, Microbe, PCR, primers, Next-Generation, high-throughput, sequencing
Depletion of Ribosomal RNA for Mosquito Gut Metagenomic RNA-seq
Institutions: New Mexico State University.
The mosquito gut accommodates dynamic microbial communities across different stages of the insect's life cycle. Characterization of the genetic capacity and functionality of the gut community will provide insight into the effects of gut microbiota on mosquito life traits. Metagenomic RNA-Seq has become an important tool to analyze transcriptomes from various microbes present in a microbial community. Messenger RNA usually comprises only 1-3% of total RNA, while rRNA constitutes approximately 90%. It is challenging to enrich messenger RNA from a metagenomic microbial RNA sample because most prokaryotic mRNA species lack stable poly(A) tails. This prevents oligo d(T) mediated mRNA isolation. Here, we describe a protocol that employs sample derived rRNA capture probes to remove rRNA from a metagenomic total RNA sample. To begin, both mosquito and microbial small and large subunit rRNA fragments are amplified from a metagenomic community DNA sample. Then, the community specific biotinylated antisense ribosomal RNA probes are synthesized in vitro
using T7 RNA polymerase. The biotinylated rRNA probes are hybridized to the total RNA. The hybrids are captured by streptavidin-coated beads and removed from the total RNA. This subtraction-based protocol efficiently removes both mosquito and microbial rRNA from the total RNA sample. The mRNA enriched sample is further processed for RNA amplification and RNA-Seq.
Genetics, Issue 74, Infection, Infectious Diseases, Molecular Biology, Cellular Biology, Microbiology, Genomics, biology (general), genetics (animal and plant), life sciences, Eukaryota, Bacteria, metagenomics, metatranscriptome, RNA-seq, rRNA depletion, mRNA enrichment, mosquito gut microbiome, RNA, DNA, sequencing
Using SecM Arrest Sequence as a Tool to Isolate Ribosome Bound Polypeptides
Institutions: Cleveland State University.
Extensive research has provided ample evidences suggesting that protein folding in the cell is a co-translational process1-5
. However, the exact pathway that polypeptide chain follows during co-translational folding to achieve its functional form is still an enigma. In order to understand this process and to determine the exact conformation of the co-translational folding intermediates, it is essential to develop techniques that allow the isolation of RNCs carrying nascent chains of predetermined sizes to allow their further structural analysis.
SecM (secretion monitor) is a 170 amino acid E. coli
protein that regulates expression of the downstream SecA (secretion driving) ATPase in the secM-secA
. Nakatogawa and Ito originally found that a 17 amino acid long sequence (150-FSTPVWISQAQGIRAG
P-166) in the C-terminal region of the SecM protein is sufficient and necessary to cause stalling of SecM elongation at Gly165, thereby producing peptidyl-glycyl-tRNA stably bound to the ribosomal P-site7-9
. More importantly, it was found that this 17 amino acid long sequence can be fused to the C-terminus of virtually any full-length and/or truncated protein thus allowing the production of RNCs carrying nascent chains of predetermined sizes7
. Thus, when fused or inserted into the target protein, SecM stalling sequence produces arrest of the polypeptide chain elongation and generates stable RNCs both in vivo
in E. coli
cells and in vitro
in a cell-free system. Sucrose gradient centrifugation is further utilized to isolate RNCs.
The isolated RNCs can be used to analyze structural and functional features of the co-translational folding intermediates. Recently, this technique has been successfully used to gain insights into the structure of several ribosome bound nascent chains10,11
. Here we describe the isolation of bovine Gamma-B Crystallin RNCs fused to SecM and generated in an in vitro
Molecular Biology, Issue 64, Ribosome, nascent polypeptides, co-translational protein folding, translational arrest, in vitro translation
Fluorescence Based Primer Extension Technique to Determine Transcriptional Starting Points and Cleavage Sites of RNases In Vivo
Institutions: University of Tübingen.
Fluorescence based primer extension (FPE) is a molecular method to determine transcriptional starting points or processing sites of RNA molecules. This is achieved by reverse transcription of the RNA of interest using specific fluorescently labeled primers and subsequent analysis of the resulting cDNA fragments by denaturing polyacrylamide gel electrophoresis. Simultaneously, a traditional Sanger sequencing reaction is run on the gel to map the ends of the cDNA fragments to their exact corresponding bases. In contrast to 5'-RACE (Rapid Amplification of cDNA Ends), where the product must be cloned and multiple candidates sequenced, the bulk of cDNA fragments generated by primer extension can be simultaneously detected in one gel run. In addition, the whole procedure (from reverse transcription to final analysis of the results) can be completed in one working day. By using fluorescently labeled primers, the use of hazardous radioactive isotope labeled reagents can be avoided and processing times are reduced as products can be detected during the electrophoresis procedure.
In the following protocol, we describe an in vivo
fluorescent primer extension method to reliably and rapidly detect the 5' ends of RNAs to deduce transcriptional starting points and RNA processing sites (e.g.,
by toxin-antitoxin system components) in S. aureus, E. coli
and other bacteria.
Molecular Biology, Issue 92, Primer extension, RNA mapping, 5' end, fluorescent primer, transcriptional starting point, TSP, RNase, toxin-antitoxin, cleavage site, gel electrophoresis, DNA isolation, RNA processing
Metabolic Labeling and Membrane Fractionation for Comparative Proteomic Analysis of Arabidopsis thaliana Suspension Cell Cultures
Institutions: Max Plank Institute of Molecular Plant Physiology, University of Hohenheim.
Plasma membrane microdomains are features based on the physical properties of the lipid and sterol environment and have particular roles in signaling processes. Extracting sterol-enriched membrane microdomains from plant cells for proteomic analysis is a difficult task mainly due to multiple preparation steps and sources for contaminations from other cellular compartments. The plasma membrane constitutes only about 5-20% of all the membranes in a plant cell, and therefore isolation of highly purified plasma membrane fraction is challenging. A frequently used method involves aqueous two-phase partitioning in polyethylene glycol and dextran, which yields plasma membrane vesicles with a purity of 95% 1
. Sterol-rich membrane microdomains within the plasma membrane are insoluble upon treatment with cold nonionic detergents at alkaline pH. This detergent-resistant membrane fraction can be separated from the bulk plasma membrane by ultracentrifugation in a sucrose gradient 2
. Subsequently, proteins can be extracted from the low density band of the sucrose gradient by methanol/chloroform precipitation. Extracted protein will then be trypsin digested, desalted and finally analyzed by LC-MS/MS. Our extraction protocol for sterol-rich microdomains is optimized for the preparation of clean detergent-resistant membrane fractions from Arabidopsis thaliana
We use full metabolic labeling of Arabidopsis thaliana
suspension cell cultures with K15
as the only nitrogen source for quantitative comparative proteomic studies following biological treatment of interest 3
. By mixing equal ratios of labeled and unlabeled cell cultures for joint protein extraction the influence of preparation steps on final quantitative result is kept at a minimum. Also loss of material during extraction will affect both control and treatment samples in the same way, and therefore the ratio of light and heave peptide will remain constant. In the proposed method either labeled or unlabeled cell culture undergoes a biological treatment, while the other serves as control 4
Empty Value, Issue 79, Cellular Structures, Plants, Genetically Modified, Arabidopsis, Membrane Lipids, Intracellular Signaling Peptides and Proteins, Membrane Proteins, Isotope Labeling, Proteomics, plants, Arabidopsis thaliana, metabolic labeling, stable isotope labeling, suspension cell cultures, plasma membrane fractionation, two phase system, detergent resistant membranes (DRM), mass spectrometry, membrane microdomains, quantitative proteomics
FtsZ Polymerization Assays: Simple Protocols and Considerations
Institutions: University of Groningen.
During bacterial cell division, the essential protein FtsZ assembles in the middle of the cell to form the so-called Z-ring. FtsZ polymerizes into long filaments in the presence of GTP in vitro
, and polymerization is regulated by several accessory proteins. FtsZ polymerization has been extensively studied in vitro
using basic methods including light scattering, sedimentation, GTP hydrolysis assays and electron microscopy. Buffer conditions influence both the polymerization properties of FtsZ, and the ability of FtsZ to interact with regulatory proteins. Here, we describe protocols for FtsZ polymerization studies and validate conditions and controls using Escherichia coli
and Bacillus subtilis
FtsZ as model proteins. A low speed sedimentation assay is introduced that allows the study of the interaction of FtsZ with proteins that bundle or tubulate FtsZ polymers. An improved GTPase assay protocol is described that allows testing of GTP hydrolysis over time using various conditions in a 96-well plate setup, with standardized incubation times that abolish variation in color development in the phosphate detection reaction. The preparation of samples for light scattering studies and electron microscopy is described. Several buffers are used to establish suitable buffer pH and salt concentration for FtsZ polymerization studies. A high concentration of KCl is the best for most of the experiments. Our methods provide a starting point for the in vitro
characterization of FtsZ, not only from E. coli
and B. subtilis
but from any other bacterium. As such, the methods can be used for studies of the interaction of FtsZ with regulatory proteins or the testing of antibacterial drugs which may affect FtsZ polymerization.
Basic Protocols, Issue 81, FtsZ, protein polymerization, cell division, GTPase, sedimentation assay, light scattering
Analysis of Translation Initiation During Stress Conditions by Polysome Profiling
Institutions: Laval University, CHU de Quebec Research Center.
Precise control of mRNA translation is fundamental for eukaryotic cell homeostasis, particularly in response to physiological and pathological stress. Alterations of this program can lead to the growth of damaged cells, a hallmark of cancer development, or to premature cell death such as seen in neurodegenerative diseases. Much of what is known concerning the molecular basis for translational control has been obtained from polysome analysis using a density gradient fractionation system. This technique relies on ultracentrifugation of cytoplasmic extracts on a linear sucrose gradient. Once the spin is completed, the system allows fractionation and quantification of centrifuged zones corresponding to different translating ribosomes populations, thus resulting in a polysome profile. Changes in the polysome profile are indicative of changes or defects in translation initiation that occur in response to various types of stress. This technique also allows to assess the role of specific proteins on translation initiation, and to measure translational activity of specific mRNAs. Here we describe our protocol to perform polysome profiles in order to assess translation initiation of eukaryotic cells and tissues under either normal or stress growth conditions.
Cellular Biology, Issue 87, Translation initiation, polysome profile, sucrose gradient, protein and RNA isolation, stress conditions
Purification of the Cystic Fibrosis Transmembrane Conductance Regulator Protein Expressed in Saccharomyces cerevisiae
Institutions: University of Manchester.
Defects in the cystic fibrosis transmembrane conductance regulator (CFTR) protein cause cystic fibrosis (CF), an autosomal recessive disease that currently limits the average life expectancy of sufferers to <40 years of age. The development of novel drug molecules to restore the activity of CFTR is an important goal in the treatment CF, and the isolation of functionally active CFTR is a useful step towards achieving this goal.
We describe two methods for the purification of CFTR from a eukaryotic heterologous expression system, S. cerevisiae
. Like prokaryotic systems, S. cerevisiae
can be rapidly grown in the lab at low cost, but can also traffic and posttranslationally modify large membrane proteins. The selection of detergents for solubilization and purification is a critical step in the purification of any membrane protein. Having screened for the solubility of CFTR in several detergents, we have chosen two contrasting detergents for use in the purification that allow the final CFTR preparation to be tailored to the subsequently planned experiments.
In this method, we provide comparison of the purification of CFTR in dodecyl-β-D-maltoside (DDM) and 1-tetradecanoyl-sn
-glycerol) (LPG-14). Protein purified in DDM by this method shows ATPase activity in functional assays. Protein purified in LPG-14 shows high purity and yield, can be employed to study post-translational modifications, and can be used for structural methods such as small-angle X-ray scattering and electron microscopy. However it displays significantly lower ATPase activity.
Biochemistry, Issue 87, Membrane protein, cystic fibrosis, CFTR, ABCC7, protein purification, Cystic Fibrosis Foundation, green fluorescent protein
Profiling of Methyltransferases and Other S-adenosyl-L-homocysteine-binding Proteins by Capture Compound Mass Spectrometry (CCMS)
Institutions: caprotec bioanalytics GmbH, RWTH Aachen University.
There is a variety of approaches to reduce the complexity of the proteome on the basis of functional small molecule-protein interactions such as affinity chromatography 1
or Activity Based Protein Profiling 2
. Trifunctional Capture Compounds (CCs, Figure 1A) 3
are the basis for a generic approach, in which the initial equilibrium-driven interaction between a small molecule probe (the selectivity function, here S
-homocysteine, SAH, Figure 1A) and target proteins is irreversibly fixed upon photo-crosslinking between an independent photo-activable reactivity function (here a phenylazide) of the CC and the surface of the target proteins. The sorting function (here biotin) serves to isolate the CC - protein conjugates from complex biological mixtures with the help of a solid phase (here streptavidin magnetic beads). Two configurations of the experiments are possible: "off-bead" 4
or the presently described "on-bead" configuration (Figure 1B). The selectivity function may be virtually any small molecule of interest (substrates, inhibitors, drug molecules).
-methionine (SAM, Figure 1A) is probably, second to ATP, the most widely used cofactor in nature 5, 6
. It is used as the major methyl group donor in all living organisms with the chemical reaction being catalyzed by SAM-dependent methyltransferases (MTases), which methylate DNA 7
, RNA 8
, proteins 9
, or small molecules 10
. Given the crucial role of methylation reactions in diverse physiological scenarios (gene regulation, epigenetics, metabolism), the profiling of MTases can be expected to become of similar importance in functional proteomics as the profiling of kinases. Analytical tools for their profiling, however, have not been available. We recently introduced a CC with SAH as selectivity group to fill this technological gap (Figure 1A).
SAH, the product of SAM after methyl transfer, is a known general MTase product inhibitor 11
. For this reason and because the natural cofactor SAM is used by further enzymes transferring other parts of the cofactor or initiating radical reactions as well as because of its chemical instability 12
, SAH is an ideal selectivity function for a CC to target MTases. Here, we report the utility of the SAH-CC and CCMS by profiling MTases and other SAH-binding proteins from the strain DH5α of Escherichia coli
), one of the best-characterized prokaryotes, which has served as the preferred model organism in countless biochemical, biological, and biotechnological studies. Photo-activated crosslinking enhances yield and sensitivity of the experiment, and the specificity can be readily tested for in competition experiments using an excess of free SAH.
Biochemistry, Issue 46, Capture Compound, photo-crosslink, small molecule-protein interaction, methyltransferase, S-adenosyl-l-homocysteine, SAH, S-adenosyl-l-methionine, SAM, functional proteomics, LC-MS/MS
Assessment of Selective mRNA Translation in Mammalian Cells by Polysome Profiling
Institutions: University of Ottawa, Montreal Neurological Institute, University of Ottawa.
Regulation of protein synthesis represents a key control point in cellular response to stress. In particular, discreet RNA regulatory elements were shown to allow to selective translation of specific mRNAs, which typically encode for proteins required for a particular stress response. Identification of these mRNAs, as well as the characterization of regulatory mechanisms responsible for selective translation has been at the forefront of molecular biology for some time. Polysome profiling is a cornerstone method in these studies. The goal of polysome profiling is to capture mRNA translation by immobilizing actively translating ribosomes on different transcripts and separate the resulting polyribosomes by ultracentrifugation on a sucrose gradient, thus allowing for a distinction between highly translated transcripts and poorly translated ones. These can then be further characterized by traditional biochemical and molecular biology methods. Importantly, combining polysome profiling with high throughput genomic approaches allows for a large scale analysis of translational regulation.
Cellular Biology, Issue 92, cellular stress, translation initiation, internal ribosome entry site, polysome, RT-qPCR, gradient
Substrate Generation for Endonucleases of CRISPR/Cas Systems
Institutions: Max-Planck-Institute for Terrestrial Microbiology.
The interaction of viruses and their prokaryotic hosts shaped the evolution of bacterial and archaeal life. Prokaryotes developed several strategies to evade viral attacks that include restriction modification, abortive infection and CRISPR/Cas systems. These adaptive immune systems found in many Bacteria and most Archaea consist of clustered regularly interspaced short palindromic repeat (CRISPR) sequences and a number of CRISPR associated (Cas) genes (Fig. 1) 1-3
. Different sets of Cas proteins and repeats define at least three major divergent types of CRISPR/Cas systems 4
. The universal proteins Cas1 and Cas2 are proposed to be involved in the uptake of viral DNA that will generate a new spacer element between two repeats at the 5' terminus of an extending CRISPR cluster 5
. The entire cluster is transcribed into a precursor-crRNA containing all spacer and repeat sequences and is subsequently processed by an enzyme of the diverse Cas6 family into smaller crRNAs 6-8
. These crRNAs consist of the spacer sequence flanked by a 5' terminal (8 nucleotides) and a 3' terminal tag derived from the repeat sequence 9
. A repeated infection of the virus can now be blocked as the new crRNA will be directed by a Cas protein complex (Cascade) to the viral DNA and identify it as such via base complementarity10
. Finally, for CRISPR/Cas type 1 systems, the nuclease Cas3 will destroy the detected invader DNA 11,12
These processes define CRISPR/Cas as an adaptive immune system of prokaryotes and opened a fascinating research field for the study of the involved Cas proteins. The function of many Cas proteins is still elusive and the causes for the apparent diversity of the CRISPR/Cas systems remain to be illuminated. Potential activities of most Cas proteins were predicted via detailed computational analyses. A major fraction of Cas proteins are either shown or proposed to function as endonucleases 4
Here, we present methods to generate crRNAs and precursor-cRNAs for the study of Cas endoribonucleases. Different endonuclease assays require either short repeat sequences that can directly be synthesized as RNA oligonucleotides or longer crRNA and pre-crRNA sequences that are generated via in vitro
T7 RNA polymerase run-off transcription. This methodology allows the incorporation of radioactive nucleotides for the generation of internally labeled endonuclease substrates and the creation of synthetic or mutant crRNAs. Cas6 endonuclease activity is utilized to mature pre-crRNAs into crRNAs with 5'-hydroxyl and a 2',3'-cyclic phosphate termini.
Molecular biology, Issue 67, CRISPR/Cas, endonuclease, in vitro transcription, crRNA, Cas6
In vivo Interrogation of Central Nervous System Translatome by Polyribosome Fractionation
Institutions: German Cancer Research Center (DKFZ).
Multiple processes are involved in gene expression including transcription, translation and stability of mRNAs and proteins. Each of these steps are tightly regulated, affecting the final dynamics of protein abundance. Various regulatory mechanisms exist at the translation step, rendering mRNA levels alone an unreliable indicator of gene expression. In addition, local regulation of mRNA translation has been particularly implicated in neuronal functions, shifting 'translatomics' to the focus of attention in neurobiology. The presented method can be used to bridge transcriptomics and proteomics.
Here we describe essential modifications to the technique of polyribosome fractionation, which interrogates the translatome based on the association of actively translated mRNAs to multiple ribosomes and their differential sedimentation in sucrose gradients. Traditionally, working with in vivo
samples, particularly of the central nervous system (CNS), has proven challenging due to the restricted amounts of material and the presence of fatty tissue components. In order to address this, the described protocol is specifically optimized for use with minimal amount of CNS material, as demonstrated by the use of single mouse spinal cord and brain. Briefly, CNS tissues are extracted and translating ribosomes are immobilized on mRNAs with cycloheximide. Myelin flotation is then performed to remove lipid rich components. Fractionation is performed on a sucrose gradient where mRNAs are separated according to their ribosomal loading. Isolated fractions are suitable for a range of downstream assays, including new genome wide assay technologies.
Neuroscience, Issue 86, central nervous system, CNS, translation, polyribosome fractionation, RNA, Brain, spinal cord, microarray, next-generation sequencing, gradient, translatome
A Practical Guide to Phylogenetics for Nonexperts
Institutions: The George Washington University.
Many researchers, across incredibly diverse foci, are applying phylogenetics to their research question(s). However, many researchers are new to this topic and so it presents inherent problems. Here we compile a practical introduction to phylogenetics for nonexperts. We outline in a step-by-step manner, a pipeline for generating reliable phylogenies from gene sequence datasets. We begin with a user-guide for similarity search tools via online interfaces as well as local executables. Next, we explore programs for generating multiple sequence alignments followed by protocols for using software to determine best-fit models of evolution. We then outline protocols for reconstructing phylogenetic relationships via maximum likelihood and Bayesian criteria and finally describe tools for visualizing phylogenetic trees. While this is not by any means an exhaustive description of phylogenetic approaches, it does provide the reader with practical starting information on key software applications commonly utilized by phylogeneticists. The vision for this article would be that it could serve as a practical training tool for researchers embarking on phylogenetic studies and also serve as an educational resource that could be incorporated into a classroom or teaching-lab.
Basic Protocol, Issue 84, phylogenetics, multiple sequence alignments, phylogenetic tree, BLAST executables, basic local alignment search tool, Bayesian models
The ITS2 Database
Institutions: University of Würzburg, University of Würzburg.
The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution1
and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation2-8
The ITS2 Database9
presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank11
. Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold12
(direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling13
. In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold.
The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST14
search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE15,16
for multiple sequence-structure alignment calculation and Neighbor Joining18
tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure.
In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses.
Genetics, Issue 61, alignment, internal transcribed spacer 2, molecular systematics, secondary structure, ribosomal RNA, phylogenetic tree, homology modeling, phylogeny
Mapping Bacterial Functional Networks and Pathways in Escherichia Coli using Synthetic Genetic Arrays
Institutions: University of Toronto, University of Toronto, University of Regina.
Phenotypes are determined by a complex series of physical (e.g.
protein-protein) and functional (e.g.
gene-gene or genetic) interactions (GI)1
. While physical interactions can indicate which bacterial proteins are associated as complexes, they do not necessarily reveal pathway-level functional relationships1. GI screens, in which the growth of double mutants bearing two deleted or inactivated genes is measured and compared to the corresponding single mutants, can illuminate epistatic dependencies between loci and hence provide a means to query and discover novel functional relationships2
. Large-scale GI maps have been reported for eukaryotic organisms like yeast3-7
, but GI information remains sparse for prokaryotes8
, which hinders the functional annotation of bacterial genomes. To this end, we and others have developed high-throughput quantitative bacterial GI screening methods9, 10
Here, we present the key steps required to perform quantitative E. coli
Synthetic Genetic Array (eSGA) screening procedure on a genome-scale9
, using natural bacterial conjugation and homologous recombination to systemically generate and measure the fitness of large numbers of double mutants in a colony array format.
Briefly, a robot is used to transfer, through conjugation, chloramphenicol (Cm) - marked mutant alleles from engineered Hfr (High frequency of recombination) 'donor strains' into an ordered array of kanamycin (Kan) - marked F- recipient strains. Typically, we use loss-of-function single mutants bearing non-essential gene deletions (e.g.
the 'Keio' collection11
) and essential gene hypomorphic mutations (i.e.
alleles conferring reduced protein expression, stability, or activity9, 12, 13
) to query the functional associations of non-essential and essential genes, respectively. After conjugation and ensuing genetic exchange mediated by homologous recombination, the resulting double mutants are selected on solid medium containing both antibiotics. After outgrowth, the plates are digitally imaged and colony sizes are quantitatively scored using an in-house automated image processing system14
. GIs are revealed when the growth rate of a double mutant is either significantly better or worse than expected9
. Aggravating (or negative) GIs often result between loss-of-function mutations in pairs of genes from compensatory pathways that impinge on the same essential process2
. Here, the loss of a single gene is buffered, such that either single mutant is viable. However, the loss of both pathways is deleterious and results in synthetic lethality or sickness (i.e.
slow growth). Conversely, alleviating (or positive) interactions can occur between genes in the same pathway or protein complex2
as the deletion of either gene alone is often sufficient to perturb the normal function of the pathway or complex such that additional perturbations do not reduce activity, and hence growth, further. Overall, systematically identifying and analyzing GI networks can provide unbiased, global maps of the functional relationships between large numbers of genes, from which pathway-level information missed by other approaches can be inferred9
Genetics, Issue 69, Molecular Biology, Medicine, Biochemistry, Microbiology, Aggravating, alleviating, conjugation, double mutant, Escherichia coli, genetic interaction, Gram-negative bacteria, homologous recombination, network, synthetic lethality or sickness, suppression
DNA-affinity-purified Chip (DAP-chip) Method to Determine Gene Targets for Bacterial Two component Regulatory Systems
Institutions: Lawrence Berkeley National Laboratory.
methods such as ChIP-chip are well-established techniques used to determine global gene targets for transcription factors. However, they are of limited use in exploring bacterial two component regulatory systems with uncharacterized activation conditions. Such systems regulate transcription only when activated in the presence of unique signals. Since these signals are often unknown, the in vitro
microarray based method described in this video article can be used to determine gene targets and binding sites for response regulators. This DNA-affinity-purified-chip method may be used for any purified regulator in any organism with a sequenced genome. The protocol involves allowing the purified tagged protein to bind to sheared genomic DNA and then affinity purifying the protein-bound DNA, followed by fluorescent labeling of the DNA and hybridization to a custom tiling array. Preceding steps that may be used to optimize the assay for specific regulators are also described. The peaks generated by the array data analysis are used to predict binding site motifs, which are then experimentally validated. The motif predictions can be further used to determine gene targets of orthologous response regulators in closely related species. We demonstrate the applicability of this method by determining the gene targets and binding site motifs and thus predicting the function for a sigma54-dependent response regulator DVU3023 in the environmental bacterium Desulfovibrio vulgaris
Genetics, Issue 89, DNA-Affinity-Purified-chip, response regulator, transcription factor binding site, two component system, signal transduction, Desulfovibrio, lactate utilization regulator, ChIP-chip
Generation of Enterobacter sp. YSU Auxotrophs Using Transposon Mutagenesis
Institutions: Youngstown State University.
Prototrophic bacteria grow on M-9 minimal salts medium supplemented with glucose (M-9 medium), which is used as a carbon and energy source. Auxotrophs can be generated using a transposome. The commercially available, Tn5
-derived transposome used in this protocol consists of a linear segment of DNA containing an R6Kγ
replication origin, a gene for kanamycin resistance and two mosaic sequence ends, which serve as transposase binding sites. The transposome, provided as a DNA/transposase protein complex, is introduced by electroporation into the prototrophic strain, Enterobacter
sp. YSU, and randomly incorporates itself into this host’s genome. Transformants are replica plated onto Luria-Bertani agar plates containing kanamycin, (LB-kan) and onto M-9 medium agar plates containing kanamycin (M-9-kan). The transformants that grow on LB-kan plates but not on M-9-kan plates are considered to be auxotrophs. Purified genomic DNA from an auxotroph is partially digested, ligated and transformed into a pir+ Escherichia coli
) strain. The R6Kγ
replication origin allows the plasmid to replicate in pir+ E. coli
strains, and the kanamycin resistance marker allows for plasmid selection. Each transformant possesses a new plasmid containing the transposon flanked by the interrupted chromosomal region. Sanger sequencing and the Basic Local Alignment Search Tool (BLAST) suggest a putative identity of the interrupted gene. There are three advantages to using this transposome mutagenesis strategy. First, it does not rely on the expression of a transposase gene by the host. Second, the transposome is introduced into the target host by electroporation, rather than by conjugation or by transduction and therefore is more efficient. Third, the R6Kγ
replication origin makes it easy to identify the mutated gene which is partially recovered in a recombinant plasmid. This technique can be used to investigate the genes involved in other characteristics of Enterobacter
sp. YSU or of a wider variety of bacterial strains.
Microbiology, Issue 92, Auxotroph, transposome, transposon, mutagenesis, replica plating, glucose minimal medium, complex medium, Enterobacter
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
Institutions: Princeton University.
The aim of de novo
protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo
protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity.
To disseminate these methods for broader use we present Protein WISDOM (https://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.
Genetics, Issue 77, Molecular Biology, Bioengineering, Biochemistry, Biomedical Engineering, Chemical Engineering, Computational Biology, Genomics, Proteomics, Protein, Protein Binding, Computational Biology, Drug Design, optimization (mathematics), Amino Acids, Peptides, and Proteins, De novo protein and peptide design, Drug design, In silico sequence selection, Optimization, Fold specificity, Binding affinity, sequencing
A Restriction Enzyme Based Cloning Method to Assess the In vitro Replication Capacity of HIV-1 Subtype C Gag-MJ4 Chimeric Viruses
Institutions: Emory University, Emory University.
The protective effect of many HLA class I alleles on HIV-1 pathogenesis and disease progression is, in part, attributed to their ability to target conserved portions of the HIV-1 genome that escape with difficulty. Sequence changes attributed to cellular immune pressure arise across the genome during infection, and if found within conserved regions of the genome such as Gag, can affect the ability of the virus to replicate in vitro
. Transmission of HLA-linked polymorphisms in Gag to HLA-mismatched recipients has been associated with reduced set point viral loads. We hypothesized this may be due to a reduced replication capacity of the virus. Here we present a novel method for assessing the in vitro
replication of HIV-1 as influenced by the gag
gene isolated from acute time points from subtype C infected Zambians. This method uses restriction enzyme based cloning to insert the gag
gene into a common subtype C HIV-1 proviral backbone, MJ4. This makes it more appropriate to the study of subtype C sequences than previous recombination based methods that have assessed the in vitro
replication of chronically derived gag-pro
sequences. Nevertheless, the protocol could be readily modified for studies of viruses from other subtypes. Moreover, this protocol details a robust and reproducible method for assessing the replication capacity of the Gag-MJ4 chimeric viruses on a CEM-based T cell line. This method was utilized for the study of Gag-MJ4 chimeric viruses derived from 149 subtype C acutely infected Zambians, and has allowed for the identification of residues in Gag that affect replication. More importantly, the implementation of this technique has facilitated a deeper understanding of how viral replication defines parameters of early HIV-1 pathogenesis such as set point viral load and longitudinal CD4+ T cell decline.
Infectious Diseases, Issue 90, HIV-1, Gag, viral replication, replication capacity, viral fitness, MJ4, CEM, GXR25
High Throughput Quantitative Expression Screening and Purification Applied to Recombinant Disulfide-rich Venom Proteins Produced in E. coli
Institutions: Aix-Marseille Université, Commissariat à l'énergie atomique et aux énergies alternatives (CEA) Saclay, France.
Escherichia coli (E. coli)
is the most widely used expression system for the production of recombinant proteins for structural and functional studies. However, purifying proteins is sometimes challenging since many proteins are expressed in an insoluble form. When working with difficult or multiple targets it is therefore recommended to use high throughput (HTP) protein expression screening on a small scale (1-4 ml cultures) to quickly identify conditions for soluble expression. To cope with the various structural genomics programs of the lab, a quantitative (within a range of 0.1-100 mg/L culture of recombinant protein) and HTP protein expression screening protocol was implemented and validated on thousands of proteins. The protocols were automated with the use of a liquid handling robot but can also be performed manually without specialized equipment.
Disulfide-rich venom proteins are gaining increasing recognition for their potential as therapeutic drug leads. They can be highly potent and selective, but their complex disulfide bond networks make them challenging to produce. As a member of the FP7 European Venomics project (www.venomics.eu), our challenge is to develop successful production strategies with the aim of producing thousands of novel venom proteins for functional characterization. Aided by the redox properties of disulfide bond isomerase DsbC, we adapted our HTP production pipeline for the expression of oxidized, functional venom peptides in the E. coli
cytoplasm. The protocols are also applicable to the production of diverse disulfide-rich proteins. Here we demonstrate our pipeline applied to the production of animal venom proteins. With the protocols described herein it is likely that soluble disulfide-rich proteins will be obtained in as little as a week. Even from a small scale, there is the potential to use the purified proteins for validating the oxidation state by mass spectrometry, for characterization in pilot studies, or for sensitive micro-assays.
Bioengineering, Issue 89, E. coli, expression, recombinant, high throughput (HTP), purification, auto-induction, immobilized metal affinity chromatography (IMAC), tobacco etch virus protease (TEV) cleavage, disulfide bond isomerase C (DsbC) fusion, disulfide bonds, animal venom proteins/peptides
Unraveling the Unseen Players in the Ocean - A Field Guide to Water Chemistry and Marine Microbiology
Institutions: San Diego State University, University of California San Diego.
Here we introduce a series of thoroughly tested and well standardized research protocols adapted for use in remote marine environments. The sampling protocols include the assessment of resources available to the microbial community (dissolved organic carbon, particulate organic matter, inorganic nutrients), and a comprehensive description of the viral and bacterial communities (via direct viral and microbial counts, enumeration of autofluorescent microbes, and construction of viral and microbial metagenomes). We use a combination of methods, which represent a dispersed field of scientific disciplines comprising already established protocols and some of the most recent techniques developed. Especially metagenomic sequencing techniques used for viral and bacterial community characterization, have been established only in recent years, and are thus still subjected to constant improvement. This has led to a variety of sampling and sample processing procedures currently in use. The set of methods presented here provides an up to date approach to collect and process environmental samples. Parameters addressed with these protocols yield the minimum on information essential to characterize and understand the underlying mechanisms of viral and microbial community dynamics. It gives easy to follow guidelines to conduct comprehensive surveys and discusses critical steps and potential caveats pertinent to each technique.
Environmental Sciences, Issue 93, dissolved organic carbon, particulate organic matter, nutrients, DAPI, SYBR, microbial metagenomics, viral metagenomics, marine environment
Annotation of Plant Gene Function via Combined Genomics, Metabolomics and Informatics
Given the ever expanding number of model plant species for which complete genome sequences are available and the abundance of bio-resources such as knockout mutants, wild accessions and advanced breeding populations, there is a rising burden for gene functional annotation. In this protocol, annotation of plant gene function using combined co-expression gene analysis, metabolomics and informatics is provided (Figure 1
). This approach is based on the theory of using target genes of known function to allow the identification of non-annotated genes likely to be involved in a certain metabolic process, with the identification of target compounds via metabolomics. Strategies are put forward for applying this information on populations generated by both forward and reverse genetics approaches in spite of none of these are effortless. By corollary this approach can also be used as an approach to characterise unknown peaks representing new or specific secondary metabolites in the limited tissues, plant species or stress treatment, which is currently the important trial to understanding plant metabolism.
Plant Biology, Issue 64, Genetics, Bioinformatics, Metabolomics, Plant metabolism, Transcriptome analysis, Functional annotation, Computational biology, Plant biology, Theoretical biology, Spectroscopy and structural analysis
Staining Proteins in Gels
Institutions: UVP, LLC, Keck Graduate Institute of Applied Life Sciences.
Following separation by electrophoretic methods, proteins in a gel can be detected by several staining methods. This unit describes protocols for detecting proteins by four popular methods. Coomassie blue staining is an easy and rapid method. Silver staining, while more time consuming, is considerably more sensitive and can thus be used to detect smaller amounts of protein. Fluorescent staining is a popular alternative to traditional staining procedures, mainly because it is more sensitive than Coomassie staining, and is often as sensitive as silver staining. Staining of proteins with SYPRO Orange and SYPRO Ruby are also demonstrated here.
Basic Protocols, Issue 17, Current Protocols Wiley, Coomassie Blue Staining, Silver Staining, SYPROruby, SYPROorange, Protein Detection
Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes
Institutions: Dartmouth College.
SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference1
. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data1
. In this article, we utilize a web version of SCOPE2
to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs3,4
and has been used in other studies5-8
The three algorithms that comprise SCOPE are BEAM9
, which finds non-degenerate motifs (ACCGGT), PRISM10
, which finds degenerate motifs (ASCGWT), and SPACER11
, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well.
Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor.
Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run.
Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and "strand" indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail1,2,9-11
Genetics, Issue 51, gene regulation, computational biology, algorithm, promoter sequence motif
Electrophoretic Separation of Proteins
Institutions: Keck Graduate Institute of Applied Life Sciences.
Electrophoresis is used to separate complex mixtures of proteins (e.g., from cells, subcellular fractions, column fractions, or immunoprecipitates), to investigate subunit compositions, and to verify homogeneity of protein samples. It can also serve to purify proteins for use in further applications. In polyacrylamide gel electrophoresis, proteins migrate in response to an electrical field through pores in a polyacrylamide gel matrix; pore size decreases with increasing acrylamide concentration. The combination of pore size and protein charge, size, and shape determines the migration rate of the protein. In this unit, the standard Laemmli method is described for discontinuous gel electrophoresis under denaturing conditions, i.e., in the presence of sodium dodecyl sulfate (SDS).
Basic Protocols, Issue 16, Current Protocols Wiley, Electrophoresis, Biochemistry, Protein Separage, Polyacrylamide Gel Electrophoresis, PAGE