Many researchers, across incredibly diverse foci, are applying phylogenetics to their research question(s). However, many researchers are new to this topic and so it presents inherent problems. Here we compile a practical introduction to phylogenetics for nonexperts. We outline in a step-by-step manner, a pipeline for generating reliable phylogenies from gene sequence datasets. We begin with a user-guide for similarity search tools via online interfaces as well as local executables. Next, we explore programs for generating multiple sequence alignments followed by protocols for using software to determine best-fit models of evolution. We then outline protocols for reconstructing phylogenetic relationships via maximum likelihood and Bayesian criteria and finally describe tools for visualizing phylogenetic trees. While this is not by any means an exhaustive description of phylogenetic approaches, it does provide the reader with practical starting information on key software applications commonly utilized by phylogeneticists. The vision for this article would be that it could serve as a practical training tool for researchers embarking on phylogenetic studies and also serve as an educational resource that could be incorporated into a classroom or teaching-lab.
23 Related JoVE Articles!
Large Scale Non-targeted Metabolomic Profiling of Serum by Ultra Performance Liquid Chromatography-Mass Spectrometry (UPLC-MS)
Institutions: Colorado State University.
Non-targeted metabolite profiling by ultra performance liquid chromatography coupled with mass spectrometry (UPLC-MS) is a powerful technique to investigate metabolism. The approach offers an unbiased and in-depth analysis that can enable the development of diagnostic tests, novel therapies, and further our understanding of disease processes. The inherent chemical diversity of the metabolome creates significant analytical challenges and there is no single experimental approach that can detect all metabolites. Additionally, the biological variation in individual metabolism and the dependence of metabolism on environmental factors necessitates large sample numbers to achieve the appropriate statistical power required for meaningful biological interpretation. To address these challenges, this tutorial outlines an analytical workflow for large scale non-targeted metabolite profiling of serum by UPLC-MS. The procedure includes guidelines for sample organization and preparation, data acquisition, quality control, and metabolite identification and will enable reliable acquisition of data for large experiments and provide a starting point for laboratories new to non-targeted metabolite profiling by UPLC-MS.
Chemistry, Issue 73, Biochemistry, Genetics, Molecular Biology, Physiology, Genomics, Proteins, Proteomics, Metabolomics, Metabolite Profiling, Non-targeted metabolite profiling, mass spectrometry, Ultra Performance Liquid Chromatography, UPLC-MS, serum, spectrometry
Glycopeptide Capture for Cell Surface Proteomics
Institutions: Simon Fraser University.
Cell surface proteins, including extracellular matrix proteins, participate in all major cellular processes and functions, such as growth, differentiation, and proliferation. A comprehensive characterization of these proteins provides rich information for biomarker discovery, cell-type identification, and drug-target selection, as well as helping to advance our understanding of cellular biology and physiology. Surface proteins, however, pose significant analytical challenges, because of their inherently low abundance, high hydrophobicity, and heavy post-translational modifications. Taking advantage of the prevalent glycosylation on surface proteins, we introduce here a high-throughput glycopeptide-capture approach that integrates the advantages of several existing N-glycoproteomics means. Our method can enrich the glycopeptides derived from surface proteins and remove their glycans for facile proteomics using LC-MS. The resolved N-glycoproteome comprises the information of protein identity and quantity as well as their sites of glycosylation. This method has been applied to a series of studies in areas including cancer, stem cells, and drug toxicity. The limitation of the method lies in the low abundance of surface membrane proteins, such that a relatively large quantity of samples is required for this analysis compared to studies centered on cytosolic proteins.
Molecular Biology, Issue 87, membrane protein, N-linked glycoprotein, post-translational modification, mass spectrometry, HPLC, hydrazide chemistry, N-glycoproteomics, glycopeptide capture
An Allele-specific Gene Expression Assay to Test the Functional Basis of Genetic Associations
Institutions: University of Oxford.
The number of significant genetic associations with common complex traits is constantly increasing. However, most of these associations have not been understood at molecular level. One of the mechanisms mediating the effect of DNA variants on phenotypes is gene expression, which has been shown to be particularly relevant for complex traits1
This method tests in a cellular context the effect of specific DNA sequences on gene expression. The principle is to measure the relative abundance of transcripts arising from the two alleles of a gene, analysing cells which carry one copy of the DNA sequences associated with disease (the risk variants)2,3
. Therefore, the cells used for this method should meet two fundamental genotypic requirements: they have to be heterozygous both for DNA risk variants and for DNA markers, typically coding polymorphisms, which can distinguish transcripts based on their chromosomal origin (Figure 1). DNA risk variants and DNA markers do not need to have the same allele frequency but the phase (haplotypic) relationship of the genetic markers needs to be understood. It is also important to choose cell types which express the gene of interest. This protocol refers specifically to the procedure adopted to extract nucleic acids from fibroblasts but the method is equally applicable to other cells types including primary cells.
DNA and RNA are extracted from the selected cell lines and cDNA is generated. DNA and cDNA are analysed with a primer extension assay, designed to target the coding DNA markers4
. The primer extension assay is carried out using the MassARRAY (Sequenom)5
platform according to the manufacturer's specifications. Primer extension products are then analysed by matrix-assisted laser desorption/ionization time of-flight mass spectrometry (MALDI-TOF/MS). Because the selected markers are heterozygous they will generate two peaks on the MS profiles. The area of each peak is proportional to the transcript abundance and can be measured with a function of the MassARRAY Typer software to generate an allelic ratio (allele 1: allele 2) calculation. The allelic ratio obtained for cDNA is normalized using that measured from genomic DNA, where the allelic ratio is expected to be 1:1 to correct for technical artifacts. Markers with a normalised allelic ratio significantly different to 1 indicate that the amount of transcript generated from the two chromosomes in the same cell is different, suggesting that the DNA variants associated with the phenotype have an effect on gene expression. Experimental controls should be used to confirm the results.
Cellular Biology, Issue 45, Gene expression, regulatory variant, haplotype, association study, primer extension, MALDI-TOF mass spectrometry, single nucleotide polymorphism, allele-specific
Structure and Coordination Determination of Peptide-metal Complexes Using 1D and 2D 1H NMR
Institutions: The Hebrew University of Jerusalem, The Hebrew University of Jerusalem.
Copper (I) binding by metallochaperone transport proteins prevents copper oxidation and release of the toxic ions that may participate in harmful redox reactions. The Cu (I) complex of the peptide model of a Cu (I) binding metallochaperone protein, which includes the sequence MTCSGCSRPG (underlined is conserved), was determined in solution under inert conditions by NMR spectroscopy.
NMR is a widely accepted technique for the determination of solution structures of proteins and peptides. Due to difficulty in crystallization to provide single crystals suitable for X-ray crystallography, the NMR technique is extremely valuable, especially as it provides information on the solution state rather than the solid state. Herein we describe all steps that are required for full three-dimensional structure determinations by NMR. The protocol includes sample preparation in an NMR tube, 1D and 2D data collection and processing, peak assignment and integration, molecular mechanics calculations, and structure analysis. Importantly, the analysis was first conducted without any preset metal-ligand bonds, to assure a reliable structure determination in an unbiased manner.
Chemistry, Issue 82, solution structure determination, NMR, peptide models, copper-binding proteins, copper complexes
Absolute Quantum Yield Measurement of Powder Samples
Institutions: Hitachi High Technologies America.
Measurement of fluorescence quantum yield has become an important tool in the search for new solutions in the development, evaluation, quality control and research of illumination, AV equipment, organic EL material, films, filters and fluorescent probes for bio-industry.
Quantum yield is calculated as the ratio of the number of photons absorbed, to the number of photons emitted by a material. The higher the quantum yield, the better the efficiency of the fluorescent material.
For the measurements featured in this video, we will use the Hitachi F-7000 fluorescence spectrophotometer equipped with the Quantum Yield measuring accessory and Report Generator program. All the information provided applies to this system.
Measurement of quantum yield in powder samples is performed following these steps:
Generation of instrument correction factors for the excitation and emission monochromators. This is an important requirement for the correct measurement of quantum yield. It has been performed in advance for the full measurement range of the instrument and will not be shown in this video due to time limitations.
Measurement of integrating sphere correction factors. The purpose of this step is to take into consideration reflectivity characteristics of the integrating sphere used for the measurements.
Reference and Sample measurement using direct excitation and indirect excitation.
Quantum Yield calculation using Direct and Indirect excitation. Direct excitation is when the sample is facing directly the excitation beam, which would be the normal measurement setup. However, because we use an integrating sphere, a portion of the emitted photons resulting from the sample fluorescence are reflected by the integrating sphere and will re-excite the sample, so we need to take into consideration indirect excitation. This is accomplished by measuring the sample placed in the port facing the emission monochromator, calculating indirect quantum yield and correcting the direct quantum yield calculation.
Corrected quantum yield calculation.
Chromaticity coordinates calculation using Report Generator program.
The Hitachi F-7000 Quantum Yield Measurement System offer advantages for this
application, as follows:
High sensitivity (S/N ratio 800 or better RMS). Signal is the Raman band of water measured under the following conditions: Ex wavelength 350 nm, band pass Ex and Em 5 nm, response 2 sec), noise is measured at the maximum of the Raman peak. High sensitivity allows measurement of samples even with low quantum yield. Using this system we have measured quantum yields as low as 0.1 for a sample of salicylic acid and as high as 0.8 for a sample of magnesium tungstate.
Highly accurate measurement with a dynamic range of 6 orders of magnitude allows for measurements of both sharp scattering peaks with high intensity, as well as broad fluorescence peaks of low intensity under the same conditions.
High measuring throughput and reduced light exposure to the sample, due to a high scanning speed of up to 60,000 nm/minute and automatic shutter function.
Measurement of quantum yield over a wide wavelength range from 240 to 800 nm.
Accurate quantum yield measurements are the result of collecting instrument spectral response and integrating sphere correction factors before measuring the sample.
Large selection of calculated parameters provided by dedicated and easy to use software.
During this video we will measure sodium salicylate in powder form which is known to have a quantum yield value of 0.4 to 0.5.
Molecular Biology, Issue 63, Powders, Quantum, Yield, F-7000, Quantum Yield, phosphor, chromaticity, Photo-luminescence
Quantitative Analysis of Chromatin Proteomes in Disease
Institutions: David Geffen School of Medicine at UCLA, David Geffen School of Medicine at UCLA, David Geffen School of Medicine at UCLA, Nora Eccles Harrison Cardiovascular Research and Training Institute, University of Utah.
In the nucleus reside the proteomes whose functions are most intimately linked with gene regulation. Adult mammalian cardiomyocyte nuclei are unique due to the high percentage of binucleated cells,1
the predominantly heterochromatic state of the DNA, and the non-dividing nature of the cardiomyocyte which renders adult nuclei in a permanent state of interphase.2
Transcriptional regulation during development and disease have been well studied in this organ,3-5
but what remains relatively unexplored is the role played by the nuclear proteins responsible for DNA packaging and expression, and how these proteins control changes in transcriptional programs that occur during disease.6
In the developed world, heart disease is the number one cause of mortality for both men and women.7
Insight on how nuclear proteins cooperate to regulate the progression of this disease is critical for advancing the current treatment options.
Mass spectrometry is the ideal tool for addressing these questions as it allows for an unbiased annotation of the nuclear proteome and relative quantification for how the abundance of these proteins changes with disease. While there have been several proteomic studies for mammalian nuclear protein complexes,8-13
there has been only one study examining the cardiac nuclear proteome, and it considered the entire nucleus, rather than exploring the proteome at the level of nuclear sub compartments.15
In large part, this shortage of work is due to the difficulty of isolating cardiac nuclei. Cardiac nuclei occur within a rigid and dense actin-myosin apparatus to which they are connected via multiple extensions from the endoplasmic reticulum, to the extent that myocyte contraction alters their overall shape.16
Additionally, cardiomyocytes are 40% mitochondria by volume17
which necessitates enrichment of the nucleus apart from the other organelles. Here we describe a protocol for cardiac nuclear enrichment and further fractionation into biologically-relevant compartments. Furthermore, we detail methods for label-free quantitative mass spectrometric dissection of these fractions-techniques amenable to in vivo
experimentation in various animal models and organ systems where metabolic labeling is not feasible.
Medicine, Issue 70, Molecular Biology, Immunology, Genetics, Genomics, Physiology, Protein, DNA, Chromatin, cardiovascular disease, proteomics, mass spectrometry
Specificity Analysis of Protein Lysine Methyltransferases Using SPOT Peptide Arrays
Institutions: Stuttgart University.
Lysine methylation is an emerging post-translation modification and it has been identified on several histone and non-histone proteins, where it plays crucial roles in cell development and many diseases. Approximately 5,000 lysine methylation sites were identified on different proteins, which are set by few dozens of protein lysine methyltransferases. This suggests that each PKMT methylates multiple proteins, however till now only one or two substrates have been identified for several of these enzymes. To approach this problem, we have introduced peptide array based substrate specificity analyses of PKMTs. Peptide arrays are powerful tools to characterize the specificity of PKMTs because methylation of several substrates with different sequences can be tested on one array. We synthesized peptide arrays on cellulose membrane using an Intavis SPOT synthesizer and analyzed the specificity of various PKMTs. Based on the results, for several of these enzymes, novel substrates could be identified. For example, for NSD1 by employing peptide arrays, we showed that it methylates K44 of H4 instead of the reported H4K20 and in addition H1.5K168 is the highly preferred substrate over the previously known H3K36. Hence, peptide arrays are powerful tools to biochemically characterize the PKMTs.
Biochemistry, Issue 93, Peptide arrays, solid phase peptide synthesis, SPOT synthesis, protein lysine methyltransferases, substrate specificity profile analysis, lysine methylation
Identification of Protein Complexes in Escherichia coli using Sequential Peptide Affinity Purification in Combination with Tandem Mass Spectrometry
Institutions: University of Toronto, University of Regina, University of Toronto.
Since most cellular processes are mediated by macromolecular assemblies, the systematic identification of protein-protein interactions (PPI) and the identification of the subunit composition of multi-protein complexes can provide insight into gene function and enhance understanding of biological systems1, 2
. Physical interactions can be mapped with high confidence vialarge-scale isolation and characterization of endogenous protein complexes under near-physiological conditions based on affinity purification of chromosomally-tagged proteins in combination with mass spectrometry (APMS). This approach has been successfully applied in evolutionarily diverse organisms, including yeast, flies, worms, mammalian cells, and bacteria1-6
. In particular, we have generated a carboxy-terminal Sequential Peptide Affinity (SPA) dual tagging system for affinity-purifying native protein complexes from cultured gram-negative Escherichia coli
, using genetically-tractable host laboratory strains that are well-suited for genome-wide investigations of the fundamental biology and conserved processes of prokaryotes1, 2, 7
. Our SPA-tagging system is analogous to the tandem affinity purification method developed originally for yeast8, 9
, and consists of a calmodulin binding peptide (CBP) followed by the cleavage site for the highly specific tobacco etch virus
(TEV) protease and three copies of the FLAG epitope (3X FLAG), allowing for two consecutive rounds of affinity enrichment. After cassette amplification, sequence-specific linear PCR products encoding the SPA-tag and a selectable marker are integrated and expressed in frame as carboxy-terminal fusions in a DY330 background that is induced to transiently express a highly efficient heterologous bacteriophage lambda recombination system10
. Subsequent dual-step purification using calmodulin and anti-FLAG affinity beads enables the highly selective and efficient recovery of even low abundance protein complexes from large-scale cultures. Tandem mass spectrometry is then used to identify the stably co-purifying proteins with high sensitivity (low nanogram detection limits).
Here, we describe detailed step-by-step procedures we commonly use for systematic protein tagging, purification and mass spectrometry-based analysis of soluble protein complexes from E. coli
, which can be scaled up and potentially tailored to other bacterial species, including certain opportunistic pathogens that are amenable to recombineering. The resulting physical interactions can often reveal interesting unexpected components and connections suggesting novel mechanistic links. Integration of the PPI data with alternate molecular association data such as genetic (gene-gene) interactions and genomic-context (GC) predictions can facilitate elucidation of the global molecular organization of multi-protein complexes within biological pathways. The networks generated for E. coli
can be used to gain insight into the functional architecture of orthologous gene products in other microbes for which functional annotations are currently lacking.
Genetics, Issue 69, Molecular Biology, Medicine, Biochemistry, Microbiology, affinity purification, Escherichia coli, gram-negative bacteria, cytosolic proteins, SPA-tagging, homologous recombination, mass spectrometry, protein interaction, protein complex
Mapping Bacterial Functional Networks and Pathways in Escherichia Coli using Synthetic Genetic Arrays
Institutions: University of Toronto, University of Toronto, University of Regina.
Phenotypes are determined by a complex series of physical (e.g.
protein-protein) and functional (e.g.
gene-gene or genetic) interactions (GI)1
. While physical interactions can indicate which bacterial proteins are associated as complexes, they do not necessarily reveal pathway-level functional relationships1. GI screens, in which the growth of double mutants bearing two deleted or inactivated genes is measured and compared to the corresponding single mutants, can illuminate epistatic dependencies between loci and hence provide a means to query and discover novel functional relationships2
. Large-scale GI maps have been reported for eukaryotic organisms like yeast3-7
, but GI information remains sparse for prokaryotes8
, which hinders the functional annotation of bacterial genomes. To this end, we and others have developed high-throughput quantitative bacterial GI screening methods9, 10
Here, we present the key steps required to perform quantitative E. coli
Synthetic Genetic Array (eSGA) screening procedure on a genome-scale9
, using natural bacterial conjugation and homologous recombination to systemically generate and measure the fitness of large numbers of double mutants in a colony array format.
Briefly, a robot is used to transfer, through conjugation, chloramphenicol (Cm) - marked mutant alleles from engineered Hfr (High frequency of recombination) 'donor strains' into an ordered array of kanamycin (Kan) - marked F- recipient strains. Typically, we use loss-of-function single mutants bearing non-essential gene deletions (e.g.
the 'Keio' collection11
) and essential gene hypomorphic mutations (i.e.
alleles conferring reduced protein expression, stability, or activity9, 12, 13
) to query the functional associations of non-essential and essential genes, respectively. After conjugation and ensuing genetic exchange mediated by homologous recombination, the resulting double mutants are selected on solid medium containing both antibiotics. After outgrowth, the plates are digitally imaged and colony sizes are quantitatively scored using an in-house automated image processing system14
. GIs are revealed when the growth rate of a double mutant is either significantly better or worse than expected9
. Aggravating (or negative) GIs often result between loss-of-function mutations in pairs of genes from compensatory pathways that impinge on the same essential process2
. Here, the loss of a single gene is buffered, such that either single mutant is viable. However, the loss of both pathways is deleterious and results in synthetic lethality or sickness (i.e.
slow growth). Conversely, alleviating (or positive) interactions can occur between genes in the same pathway or protein complex2
as the deletion of either gene alone is often sufficient to perturb the normal function of the pathway or complex such that additional perturbations do not reduce activity, and hence growth, further. Overall, systematically identifying and analyzing GI networks can provide unbiased, global maps of the functional relationships between large numbers of genes, from which pathway-level information missed by other approaches can be inferred9
Genetics, Issue 69, Molecular Biology, Medicine, Biochemistry, Microbiology, Aggravating, alleviating, conjugation, double mutant, Escherichia coli, genetic interaction, Gram-negative bacteria, homologous recombination, network, synthetic lethality or sickness, suppression
Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
Institutions: University of Toledo Health Science Campus.
Non-coding genomic regions in complex eukaryotes, including intergenic areas, introns, and untranslated segments of exons, are profoundly non-random in their nucleotide composition and consist of a complex mosaic of sequence patterns. These patterns include so-called Mid-Range Inhomogeneity (MRI) regions -- sequences 30-10000 nucleotides in length that are enriched by a particular base or combination of bases (e.g. (G+T)-rich, purine-rich, etc.). MRI regions are associated with unusual (non-B-form) DNA structures that are often involved in regulation of gene expression, recombination, and other genetic processes (Fedorova & Fedorov 2010). The existence of a strong fixation bias within MRI regions against mutations that tend to reduce their sequence inhomogeneity additionally supports the functionality and importance of these genomic sequences (Prakash et al.
Here we demonstrate a freely available Internet resource -- the Genomic MRI
program package -- designed for computational analysis of genomic sequences in order to find and characterize various MRI patterns within them (Bechtel et al.
2008). This package also allows generation of randomized sequences with various properties and level of correspondence to the natural input DNA sequences. The main goal of this resource is to facilitate examination of vast regions of non-coding DNA that are still scarcely investigated and await thorough exploration and recognition.
Genetics, Issue 51, bioinformatics, computational biology, genomics, non-randomness, signals, gene regulation, DNA conformation
Proteomic Sample Preparation from Formalin Fixed and Paraffin Embedded Tissue
Institutions: Max Planck Institute of Biochemistry.
Preserved clinical material is a unique source for proteomic investigation of human disorders. Here we describe an optimized protocol allowing large scale quantitative analysis of formalin fixed and paraffin embedded (FFPE) tissue. The procedure comprises four distinct steps. The first one is the preparation of sections from the FFPE material and microdissection of cells of interest. In the second step the isolated cells are lysed and processed using 'filter aided sample preparation' (FASP) technique. In this step, proteins are depleted from reagents used for the sample lysis and are digested in two-steps using endoproteinase LysC and trypsin.
After each digestion, the peptides are collected in separate fractions and their content is determined using a highly sensitive fluorescence measurement. Finally, the peptides are fractionated on 'pipette-tip' microcolumns. The LysC-peptides are separated into 4 fractions whereas the tryptic peptides are separated into 2 fractions. In this way prepared samples allow analysis of proteomes from minute amounts of material to a depth of 10,000 proteins. Thus, the described workflow is a powerful technique for studying diseases in a system-wide-fashion as well as for identification of potential biomarkers and drug targets.
Chemistry, Issue 79, Clinical Chemistry Tests, Proteomics, Proteomics, Proteomics, analytical chemistry, Formalin fixed and paraffin embedded (FFPE), sample preparation, proteomics, filter aided sample preparation (FASP), clinical proteomics; microdissection, SAX-fractionation
Application of MassSQUIRM for Quantitative Measurements of Lysine Demethylase Activity
Institutions: University of Arkansas for Medical Sciences .
Recently, epigenetic regulators have been discovered as key players in many different diseases 1-3
. As a result, these enzymes are prime targets for small molecule studies and drug development 4
. Many epigenetic regulators have only recently been discovered and are still in the process of being classified. Among these enzymes are lysine demethylases which remove methyl groups from lysines on histones and other proteins. Due to the novel nature of this class of enzymes, few assays have been developed to study their activity. This has been a road block to both the classification and high throughput study of histone demethylases. Currently, very few demethylase assays exist. Those that do exist tend to be qualitative in nature and cannot simultaneously discern between the different lysine methylation states (un-, mono-, di- and tri-). Mass spectrometry is commonly used to determine demethylase activity but current mass spectrometric assays do not address whether differentially methylated peptides ionize differently. Differential ionization of methylated peptides makes comparing methylation states difficult and certainly not quantitative (Figure 1A). Thus available assays are not optimized for the comprehensive analysis of demethylase activity.
Here we describe a method called MassSQUIRM (mass spectrometric quantitation using isotopic reductive methylation) that is based on reductive methylation of amine groups with deuterated formaldehyde to force all lysines to be di-methylated, thus making them essentially the same chemical species and therefore ionize the same (Figure 1B). The only chemical difference following the reductive methylation is hydrogen and deuterium, which does not affect MALDI ionization efficiencies. The MassSQUIRM assay is specific for demethylase reaction products with un-, mono- or di-methylated lysines. The assay is also applicable to lysine methyltransferases giving the same reaction products. Here, we use a combination of reductive methylation chemistry and MALDI mass spectrometry to measure the activity of LSD1, a lysine demethylase capable of removing di- and mono-methyl groups, on a synthetic peptide substrate 5
. This assay is simple and easily amenable to any lab with access to a MALDI mass spectrometer in lab or through a proteomics facility. The assay has ~8-fold dynamic range and is readily scalable to plate format 5
Molecular Biology, Issue 61, LSD1, lysine demethylase, mass spectrometry, reductive methylation, demethylase quantification
Protease- and Acid-catalyzed Labeling Workflows Employing 18O-enriched Water
Institutions: Boston Biomedical Research Institute.
Stable isotopes are essential tools in biological mass spectrometry. Historically, 18
O-stable isotopes have been extensively used to study the catalytic mechanisms of proteolytic enzymes1-3
. With the advent of mass spectrometry-based proteomics, the enzymatically-catalyzed incorporation of 18
O-atoms from stable isotopically enriched water has become a popular method to quantitatively compare protein expression levels (
reviewed by Fenselau and Yao4
, Miyagi and Rao5
and Ye et al.6)
O-labeling constitutes a simple and low-cost alternative to chemical (e.g.
iTRAQ, ICAT) and metabolic (e.g.
SILAC) labeling techniques7
. Depending on the protease utilized, 18
O-labeling can result in the incorporation of up to two 18
O-atoms in the C-terminal carboxyl group of the cleavage product3
. The labeling reaction can be subdivided into two independent processes, the peptide bond cleavage and the carboxyl oxygen exchange reaction8
. In our PALeO (p
-enriched water) adaptation of enzymatic 18
O-labeling, we utilized 50% 18
O-enriched water to yield distinctive isotope signatures. In combination with high-resolution matrix-assisted laser desorption ionization time-of-flight tandem mass spectrometry (MALDI-TOF/TOF MS/MS), the characteristic isotope envelopes can be used to identify cleavage products with a high level of specificity. We previously have used the PALeO-methodology to detect and characterize endogenous proteases9
and monitor proteolytic reactions10-11
. Since PALeO encodes the very essence of the proteolytic cleavage reaction, the experimental setup is simple and biochemical enrichment steps of cleavage products can be circumvented. The PALeO-method can easily be extended to (i) time course experiments that monitor the dynamics of proteolytic cleavage reactions and (ii) the analysis of proteolysis in complex biological samples that represent physiological conditions. PALeO-TimeCourse experiments help identifying rate-limiting processing steps and reaction intermediates in complex proteolytic pathway reactions. Furthermore, the PALeO-reaction allows us to identify proteolytic enzymes such as the serine protease trypsin that is capable to rebind its cleavage products and catalyze the incorporation of a second 18
O-atom. Such "double-labeling" enzymes can be used for postdigestion 18
O-labeling, in which peptides are exclusively labeled by the carboxyl oxygen exchange reaction. Our third strategy extends labeling employing 18
O-enriched water beyond enzymes and uses acidic pH conditions to introduce 18
O-stable isotope signatures into peptides.
Biochemistry, Issue 72, Molecular Biology, Proteins, Proteomics, Chemistry, Physics, MALDI-TOF mass spectrometry, proteomics, proteolysis, quantification, stable isotope labeling, labeling, catalyst, peptides, 18-O enriched water
Bottom-up and Shotgun Proteomics to Identify a Comprehensive Cochlear Proteome
Institutions: University of South Florida.
Proteomics is a commonly used approach that can provide insights into complex biological systems. The cochlear sensory epithelium contains receptors that transduce the mechanical energy of sound into an electro-chemical energy processed by the peripheral and central nervous systems. Several proteomic techniques have been developed to study the cochlear inner ear, such as two-dimensional difference gel electrophoresis (2D-DIGE), antibody microarray, and mass spectrometry (MS). MS is the most comprehensive and versatile tool in proteomics and in conjunction with separation methods can provide an in-depth proteome of biological samples. Separation methods combined with MS has the ability to enrich protein samples, detect low molecular weight and hydrophobic proteins, and identify low abundant proteins by reducing the proteome dynamic range. Different digestion strategies can be applied to whole lysate or to fractionated protein lysate to enhance peptide and protein sequence coverage. Utilization of different separation techniques, including strong cation exchange (SCX), reversed-phase (RP), and gel-eluted liquid fraction entrapment electrophoresis (GELFrEE) can be applied to reduce sample complexity prior to MS analysis for protein identification.
Biochemistry, Issue 85, Cochlear, chromatography, LC-MS/MS, mass spectrometry, Proteomics, sensory epithelium
The ChroP Approach Combines ChIP and Mass Spectrometry to Dissect Locus-specific Proteomic Landscapes of Chromatin
Institutions: European Institute of Oncology.
Chromatin is a highly dynamic nucleoprotein complex made of DNA and proteins that controls various DNA-dependent processes. Chromatin structure and function at specific regions is regulated by the local enrichment of histone post-translational modifications (hPTMs) and variants, chromatin-binding proteins, including transcription factors, and DNA methylation. The proteomic characterization of chromatin composition at distinct functional regions has been so far hampered by the lack of efficient protocols to enrich such domains at the appropriate purity and amount for the subsequent in-depth analysis by Mass Spectrometry (MS). We describe here a newly designed chromatin proteomics strategy, named ChroP (Chromatin Proteomics
), whereby a preparative chromatin immunoprecipitation is used to isolate distinct chromatin regions whose features, in terms of hPTMs, variants and co-associated non-histonic proteins, are analyzed by MS. We illustrate here the setting up of ChroP for the enrichment and analysis of transcriptionally silent heterochromatic regions, marked by the presence of tri-methylation of lysine 9 on histone H3. The results achieved demonstrate the potential of ChroP
in thoroughly characterizing the heterochromatin proteome and prove it as a powerful analytical strategy for understanding how the distinct protein determinants of chromatin interact and synergize to establish locus-specific structural and functional configurations.
Biochemistry, Issue 86, chromatin, histone post-translational modifications (hPTMs), epigenetics, mass spectrometry, proteomics, SILAC, chromatin immunoprecipitation , histone variants, chromatome, hPTMs cross-talks
Automated, Quantitative Cognitive/Behavioral Screening of Mice: For Genetics, Pharmacology, Animal Cognition and Undergraduate Instruction
Institutions: Rutgers University, Koç University, New York University, Fairfield University.
We describe a high-throughput, high-volume, fully automated, live-in 24/7 behavioral testing system for assessing the effects of genetic and pharmacological manipulations on basic mechanisms of cognition and learning in mice. A standard polypropylene mouse housing tub is connected through an acrylic tube to a standard commercial mouse test box. The test box has 3 hoppers, 2 of which are connected to pellet feeders. All are internally illuminable with an LED and monitored for head entries by infrared (IR) beams. Mice live in the environment, which eliminates handling during screening. They obtain their food during two or more daily feeding periods by performing in operant (instrumental) and Pavlovian (classical) protocols, for which we have written protocol-control software and quasi-real-time data analysis and graphing software. The data analysis and graphing routines are written in a MATLAB-based language created to simplify greatly the analysis of large time-stamped behavioral and physiological event records and to preserve a full data trail from raw data through all intermediate analyses to the published graphs and statistics within a single data structure. The data-analysis code harvests the data several times a day and subjects it to statistical and graphical analyses, which are automatically stored in the "cloud" and on in-lab computers. Thus, the progress of individual mice is visualized and quantified daily. The data-analysis code talks to the protocol-control code, permitting the automated advance from protocol to protocol of individual subjects. The behavioral protocols implemented are matching, autoshaping, timed hopper-switching, risk assessment in timed hopper-switching, impulsivity measurement, and the circadian anticipation of food availability. Open-source protocol-control and data-analysis code makes the addition of new protocols simple. Eight test environments fit in a 48 in x 24 in x 78 in cabinet; two such cabinets (16 environments) may be controlled by one computer.
Behavior, Issue 84, genetics, cognitive mechanisms, behavioral screening, learning, memory, timing
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
Institutions: Princeton University.
The aim of de novo
protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo
protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity.
To disseminate these methods for broader use we present Protein WISDOM (http://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.
Genetics, Issue 77, Molecular Biology, Bioengineering, Biochemistry, Biomedical Engineering, Chemical Engineering, Computational Biology, Genomics, Proteomics, Protein, Protein Binding, Computational Biology, Drug Design, optimization (mathematics), Amino Acids, Peptides, and Proteins, De novo protein and peptide design, Drug design, In silico sequence selection, Optimization, Fold specificity, Binding affinity, sequencing
A New Approach for the Comparative Analysis of Multiprotein Complexes Based on 15N Metabolic Labeling and Quantitative Mass Spectrometry
Institutions: University of Münster, Carnegie Institution for Science.
The introduced protocol provides a tool for the analysis of multiprotein complexes in the thylakoid membrane, by revealing insights into complex composition under different conditions. In this protocol the approach is demonstrated by comparing the composition of the protein complex responsible for cyclic electron flow (CEF) in Chlamydomonas reinhardtii
, isolated from genetically different strains. The procedure comprises the isolation of thylakoid membranes, followed by their separation into multiprotein complexes by sucrose density gradient centrifugation, SDS-PAGE, immunodetection and comparative, quantitative mass spectrometry (MS) based on differential metabolic labeling (14
N) of the analyzed strains. Detergent solubilized thylakoid membranes are loaded on sucrose density gradients at equal chlorophyll concentration. After ultracentrifugation, the gradients are separated into fractions, which are analyzed by mass-spectrometry based on equal volume. This approach allows the investigation of the composition within the gradient fractions and moreover to analyze the migration behavior of different proteins, especially focusing on ANR1, CAS, and PGRL1. Furthermore, this method is demonstrated by confirming the results with immunoblotting and additionally by supporting the findings from previous studies (the identification and PSI-dependent migration of proteins that were previously described to be part of the CEF-supercomplex such as PGRL1, FNR, and cyt f
). Notably, this approach is applicable to address a broad range of questions for which this protocol can be adopted and e.g.
used for comparative analyses of multiprotein complex composition isolated from distinct environmental conditions.
Microbiology, Issue 85, Sucrose density gradients, Chlamydomonas, multiprotein complexes, 15N metabolic labeling, thylakoids
A Strategy for Sensitive, Large Scale Quantitative Metabolomics
Institutions: Cornell University, Cornell University.
Metabolite profiling has been a valuable asset in the study of metabolism in health and disease. However, current platforms have different limiting factors, such as labor intensive sample preparations, low detection limits, slow scan speeds, intensive method optimization for each metabolite, and the inability to measure both positively and negatively charged ions in single experiments. Therefore, a novel metabolomics protocol could advance metabolomics studies. Amide-based hydrophilic chromatography enables polar metabolite analysis without any chemical derivatization. High resolution MS using the Q-Exactive (QE-MS) has improved ion optics, increased scan speeds (256 msec at resolution 70,000), and has the capability of carrying out positive/negative switching. Using a cold methanol extraction strategy, and coupling an amide column with QE-MS enables robust detection of 168 targeted polar metabolites and thousands of additional features simultaneously. Data processing is carried out with commercially available software in a highly efficient way, and unknown features extracted from the mass spectra can be queried in databases.
Chemistry, Issue 87, high-resolution mass spectrometry, metabolomics, positive/negative switching, low mass calibration, Orbitrap
Metabolic Labeling and Membrane Fractionation for Comparative Proteomic Analysis of Arabidopsis thaliana Suspension Cell Cultures
Institutions: Max Plank Institute of Molecular Plant Physiology, University of Hohenheim.
Plasma membrane microdomains are features based on the physical properties of the lipid and sterol environment and have particular roles in signaling processes. Extracting sterol-enriched membrane microdomains from plant cells for proteomic analysis is a difficult task mainly due to multiple preparation steps and sources for contaminations from other cellular compartments. The plasma membrane constitutes only about 5-20% of all the membranes in a plant cell, and therefore isolation of highly purified plasma membrane fraction is challenging. A frequently used method involves aqueous two-phase partitioning in polyethylene glycol and dextran, which yields plasma membrane vesicles with a purity of 95% 1
. Sterol-rich membrane microdomains within the plasma membrane are insoluble upon treatment with cold nonionic detergents at alkaline pH. This detergent-resistant membrane fraction can be separated from the bulk plasma membrane by ultracentrifugation in a sucrose gradient 2
. Subsequently, proteins can be extracted from the low density band of the sucrose gradient by methanol/chloroform precipitation. Extracted protein will then be trypsin digested, desalted and finally analyzed by LC-MS/MS. Our extraction protocol for sterol-rich microdomains is optimized for the preparation of clean detergent-resistant membrane fractions from Arabidopsis thaliana
We use full metabolic labeling of Arabidopsis thaliana
suspension cell cultures with K15
as the only nitrogen source for quantitative comparative proteomic studies following biological treatment of interest 3
. By mixing equal ratios of labeled and unlabeled cell cultures for joint protein extraction the influence of preparation steps on final quantitative result is kept at a minimum. Also loss of material during extraction will affect both control and treatment samples in the same way, and therefore the ratio of light and heave peptide will remain constant. In the proposed method either labeled or unlabeled cell culture undergoes a biological treatment, while the other serves as control 4
Empty Value, Issue 79, Cellular Structures, Plants, Genetically Modified, Arabidopsis, Membrane Lipids, Intracellular Signaling Peptides and Proteins, Membrane Proteins, Isotope Labeling, Proteomics, plants, Arabidopsis thaliana, metabolic labeling, stable isotope labeling, suspension cell cultures, plasma membrane fractionation, two phase system, detergent resistant membranes (DRM), mass spectrometry, membrane microdomains, quantitative proteomics
Sampling Human Indigenous Saliva Peptidome Using a Lollipop-Like Ultrafiltration Probe: Simplify and Enhance Peptide Detection for Clinical Mass Spectrometry
Institutions: Sanford-Burnham Medical Research Institute, University of California, San Diego , VA San Diego Healthcare Center, University of California, San Diego .
Although human saliva proteome and peptidome have been revealed 1-2
they were majorly identified from tryptic digests of saliva proteins. Identification of indigenous peptidome of human saliva without prior digestion with exogenous enzymes becomes imperative, since native peptides in human saliva provide potential values for diagnosing disease, predicting disease progression, and monitoring therapeutic efficacy. Appropriate sampling is a critical step for enhancement of identification of human indigenous saliva peptidome. Traditional methods of sampling human saliva involving centrifugation to remove debris 3-4
may be too time-consuming to be applicable for clinical use. Furthermore, debris removal by centrifugation may be unable to clean most of the infected pathogens and remove the high abundance proteins that often hinder the identification of low abundance peptidome.
Conventional proteomic approaches that primarily utilize two-dimensional gel electrophoresis (2-DE) gels in conjugation with in-gel digestion are capable of identifying many saliva proteins 5-6
. However, this approach is generally not sufficiently sensitive to detect low abundance peptides/proteins. Liquid chromatography-Mass spectrometry (LC-MS) based proteomics is an alternative that can identify proteins without prior 2-DE separation. Although this approach provides higher sensitivity, it generally needs prior sample pre-fractionation 7
and pre-digestion with trypsin, which makes it difficult for clinical use.
To circumvent the hindrance in mass spectrometry due to sample preparation, we have developed a technique called capillary ultrafiltration (CUF) probes 8-11
. Data from our laboratory demonstrated that the CUF probes are capable of capturing proteins in vivo
from various microenvironments in animals in a dynamic and minimally invasive manner 8-11
. No centrifugation is needed since a negative pressure is created by simply syringe withdrawing during sample collection. The CUF probes combined with LC-MS have successfully identified tryptic-digested proteins 8-11
. In this study, we upgraded the ultrafiltration sampling technique by creating a lollipop-like ultrafiltration (LLUF) probe that can easily fit in the human oral cavity. The direct analysis by LC-MS without trypsin digestion showed that human saliva indigenously contains many peptide fragments derived from various proteins. Sampling saliva with LLUF probes avoided centrifugation but effectively removed many larger and high abundance proteins. Our mass spectrometric results illustrated that many low abundance peptides became detectable after filtering out larger proteins with LLUF probes. Detection of low abundance saliva peptides was independent of multiple-step sample separation with chromatography. For clinical application, the LLUF probes incorporated with LC-MS could potentially be used in the future to monitor disease progression from saliva.
Medicine, Issue 66, Molecular Biology, Genetics, Sampling, Saliva, Peptidome, Ultrafiltration, Mass spectrometry
Identification of Key Factors Regulating Self-renewal and Differentiation in EML Hematopoietic Precursor Cells by RNA-sequencing Analysis
Institutions: The University of Texas Graduate School of Biomedical Sciences at Houston.
Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro
. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study.
RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment.
In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro
and in vivo
Genetics, Issue 93, EML Cells, Self-renewal, Differentiation, Hematopoietic precursor cell, RNA-Sequencing, Data analysis
Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes
Institutions: Dartmouth College.
SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference1
. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data1
. In this article, we utilize a web version of SCOPE2
to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs3,4
and has been used in other studies5-8
The three algorithms that comprise SCOPE are BEAM9
, which finds non-degenerate motifs (ACCGGT), PRISM10
, which finds degenerate motifs (ASCGWT), and SPACER11
, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well.
Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor.
Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run.
Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and "strand" indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail1,2,9-11
Genetics, Issue 51, gene regulation, computational biology, algorithm, promoter sequence motif