The Journal of Visualized Experiments (JoVE) is a peer reviewed, PubMed-indexed video journal. Our mission is to increase the productivity of scientific research.

Recommend to Librarian

In JoVE (1)

Other Publications (38)

Automatic Translation

This translation into Arabic was automatically generated.
English Version | Other Languages

Articles by David Fenyo in JoVE

 JoVE General

MALDI تحضير العينة : الترا أسلوب طبقة رقيقة


JoVE 192 4/29/2007

Laboratory of Mass Spectrometry and Gaseous Ion Chemistry, Rockefeller University

هذا الفيديو يوضح إعداد طبقة مصفوفة / الحليلة رقيقة جدا لتحليل الببتيدات والبروتينات التي كتبها ماتريكس بمساعدة الليزر المج التأين طيف الكتلة (MS - MALDI).

Other articles by David Fenyo on PubMed

RADARS, a Bioinformatics Solution That Automates Proteome Mass Spectral Analysis, Optimises Protein Identification, and Archives Data in a Relational Database

RADARS, a rapid, automated, data archiving and retrieval software system for high-throughput proteomic mass spectral data processing and storage, is described. The majority of mass spectrometer data files are compatible with RADARS, for consistent processing. The system automatically takes unprocessed data files, identifies proteins via in silico database searching, then stores the processed data and search results in a relational database suitable for customized reporting. The system is robust, used in 24/7 operation, accessible to multiple users of an intranet through a web browser, may be monitored by Virtual Private Network, and is secure. RADARS is scalable for use on one or many computers, and is suited to multiple processor systems. It can incorporate any local database in FASTA format, and can search protein and DNA databases online. A key feature is a suite of visualisation tools (many available gratis), allowing facile manipulation of spectra, by hand annotation, reanalysis, and access to all procedures. We also described the use of Sonar MS/MS, a novel, rapid search engine requiring 40 MB RAM per process for searches against a genomic or EST database translated in all six reading frames. RADARS reduces the cost of analysis by its efficient algorithms: Sonar MS/MS can identifiy proteins without accurate knowledge of the parent ion mass and without protein tags. Statistical scoring methods provide close-to-expert accuracy and brings robust data analysis to the non-expert user.

A Model of Random Mass-matching and Its Use for Automated Significance Testing in Mass Spectrometric Proteome Analysis

A rapid and accurate method for testing the significance of protein identities determined by mass spectrometric analysis of protein digests and genome database searching is presented. The method is based on direct computation using a statistical model of the random matching of measured and theoretical proteolytic peptide masses. Protein identification algorithms typically rank the proteins of a genome database according to a score based on the number of matches between the masses obtained by mass spectrometry analysis and the theoretical proteolytic peptide masses of a database protein. The random matching of experimental and theoretical masses can cause false results. A result is significant only if the score characterizing the result deviates significantly from the score expected from a false result. A distribution of the score (number of matches) for random (false) results is computed directly from our model of the random matching, which allows significance testing under any experimental and database search constraints. In order to mimic protein identification data quality in large-scale proteome projects, low-to-high quality proteolytic peptide mass data were generated in silico and subsequently submitted to a database search program designed to include significance testing based on direct computation. This simulation procedure demonstrates the usefulness of direct significance testing for automatically screening for samples that must be subjected to peptide sequence analysis by e.g. tandem mass spectrometry in order to determine the protein identity.

Informatics and Data Management in Proteomics

Proteomics has become dominated by large amounts of experimental data and interpreted results. This experimental data cannot be effectively used without understanding the fundamental structure of its information content and representing that information in such a way that knowledge can be extracted from it. This review explores the structure of this information with regard to three fundamental issues: the extraction of relevant information from raw data, the scale of the projects involved and the statistical significance of protein identification results.

A Modular Cross-linking Approach for Exploring Protein Interactions

A method is described for the elucidation of protein-protein interactions using novel cross-linking reagents and mass spectrometry. The method incorporates (1) a modular solid-phase synthetic strategy for generating the cross-linking reagents, (2) enrichment and digestion of cross-linked proteins using microconcentrators, (3) mass spectrometric analysis of cross-linked peptides, and (4) comprehensive computational analysis of the cross-linking data. This integrated approach has been applied to the study of cross-linking between the components of the heterodimeric protein complex negative cofactor 2.

A Method for Assessing the Statistical Significance of Mass Spectrometry-based Protein Identifications Using General Scoring Schemes

This paper investigates the use of survival functions and expectation values to evaluate the results of protein identification experiments. These functions are standard statistical measures that can be used to reduce various protein identification scoring schemes to a common, easily interpretably representation. The relative merits of scoring systems were explored using this approach, as well as the effects of altering primary identification parameters. We would advocate the widespread use of these simple statistical measures to simplify and standardize the reporting of the confidence of protein identification results, allowing the users of different identification algorithms to compare their results in a straightforward and statistically significant manner. A method is described for measuring these distributions using information that is being discarded by most protein identification search engines, resulting in accurate survival functions that are specific to any combination of scoring algorithms, sequence databases, and mass spectra.

Probity: a Protein Identification Algorithm with Accurate Assignment of the Statistical Significance of the Results

An algorithm for protein identification based on mass spectrometric proteolytic peptide mapping and genome database searching is presented. The algorithm ranks database proteins based on direct calculation of the probability of random matching and assigns the statistical significance to each result. We investigate the performance of the algorithm by simulation and show that the algorithm responds to random data in the desired manner and that the statistical significance computed indicates the risk that a particular identification result is false.

The Statistical Significance of Protein Identification Results As a Function of the Number of Protein Sequences Searched

The potential for obtaining a true mass spectrometric protein identification result depends on the choice of algorithm as well as on experimental factors that influence the information content in the mass spectrometric data. Current methods can never prove definitively that a result is true, but an appropriate choice of algorithm can provide a measure of the statistical risk that a result is false, i.e., the statistical significance. We recently demonstrated an algorithm, Probity, which assigns the statistical significance to each result. For any choice of algorithm, the difficulty of obtaining statistically significant results depends on the number of protein sequences in the sequence collection searched. By simulations of random protein identifications and using the Probity algorithm, we here demonstrate explicitly how the statistical significance depends on the number of sequences searched. We also provide an example on how the practitioner's choice of taxonomic constraints influences the statistical significance.

A Proteomics Approach to the Study of Absorption, Distribution, Metabolism, Excretion, and Toxicity

A proteomics approach was used to identify liver proteins that displayed altered levels in mice following treatment with a candidate drug. Samples from livers of mice treated with candidate drug or untreated were prepared, quantified, labeled with CyDye DIGE Fluors, and subjected to two-dimensional electrophoresis. Following scanning and imaging of gels from three different isoelectric focusing intervals (3-10, 7-11, 6.2-7.5), automated spot handling was performed on a large number of gel spots including those found to differ more than 20% between the treated and untreated condition. Subsequently, differentially regulated proteins were subjected to a three-step approach of mass spectrometry using (a) matrix-assisted laser desorption/ionization time-of-flight mass spectrometry peptide mass fingerprinting, (b) post-source decay utilizing chemically assisted fragmentation, and (c) liquid chromatography-tandem mass spectrometry. Using this approach we have so far resolved 121 differentially regulated proteins following treatment of mice with the candidate drug and identified 110 of these using mass spectrometry. Such data can potentially give improved molecular insight into the metabolism of drugs as well as the proteins involved in potential toxicity following the treatment. The differentially regulated proteins could be used as targets for metabolic studies or as markers for toxicity.

Finding Protein Sequences Using PROWL

PROWL is a collection of tools for the identification of protein sequences, using input data derived from mass spectrometry. Experimental data from various types of mass spectrometers can be input directly into PROWL's component software. This unit presents protocols for several of the individual PROWL tools. Specifically, PepFrag allows for the analysis of a single spectrum derived from tandem mass spectrometry. GPM, on the other hand, provides for the analysis of multiple MS/MS spectra. An additional protocol introduces ProFound for analyzing a single spectrum of peptide mass fingerprinting data.

Phosphotyrosine Signaling Networks in Epidermal Growth Factor Receptor Overexpressing Squamous Carcinoma Cells

Overexpression and enhanced activation of the epidermal growth factor (EGF) receptor are frequent events in human cancers that correlate with poor prognosis. Anti-phosphotyrosine and anti-EGFr affinity chromatography, isotope-coded muLC-MS/MS, and immunoblot methods were combined to describe and measure signaling networks associated with EGF receptor activation and pharmacological inhibition. The squamous carcinoma cell line HN5, which overexpresses EGF receptor and displays sustained receptor kinase activation, was used as a model system, where pharmacological inhibition of EGF receptor kinase by erlotinib markedly reduced auto and substrate phosphorylation, Src family phosphorylation at EGFR Y845, while increasing total EGF receptor protein. Diverse sets of known and poorly described functional protein classes were unequivocally identified by affinity selection, comprising either proteins tyrosine phosphorylated or complexed therewith, predominantly through EGF receptor and Src family kinases, principally 1) immediate EGF receptor signaling complexes (18%); 2) complexes involved in adhesion and cell-cell contacts (34%); and 3) receptor internalization and degradation signals. Novel and known phosphorylation sites could be located despite the complexity of the peptide mixtures. In addition to interactions with multiple signaling adaptors Grb2, SHC, SCK, and NSP2, EGF receptors in HN5 cells were shown to form direct or indirect physical interactions with additional kinases including ACK1, focal adhesion kinase (FAK), Pyk2, Yes, EphA2, and EphB4. Pharmacological inhibition of EGF receptor kinase activity by erlotinib resulted in reduced phosphorylation of downstream signaling, for example through Cbl/Cbl-B, phospholipase Cgamma (PLCgamma), Erk1/2, PI-3 kinase, and STAT3/5. Focal adhesion proteins, FAK, Pyk2, paxillin, ARF/GIT1, and plakophillin were down-regulated by transient EGF stimulation suggesting a complex balance between growth factor induced kinase and phosphatase activities in the control of cell adhesion complexes. The functional interactions between IGF-1 receptor, lysophosphatidic acid (LPA) signaling, and EGF receptor were observed, both direct and/or indirectly on phospho-Akt, phospho-Erk1/2, and phospho-ribosomal S6.

Protein Identification in Complex Mixtures

This paper investigates the prospects of successful mass spectrometric protein identification based on mass data from proteolytic digests of complex protein mixtures. Sets of proteolytic peptide masses representing various numbers of digested proteins in a mixture were generated in silico. In each set, different proteins were selected from a protein sequence collection and for each protein the sequence coverage was randomly selected within a particular regime (15-30% or 30-60%). We demonstrate that the Probity algorithm, which is characterized by an optimal tolerance for random interference, employed in an iterative procedure can correctly identify >95% of proteins at a desired significance level in mixtures composed of hundreds of yeast proteins under realistic mass spectrometric experimental constraints. By using a model of the distribution of protein abundance, we demonstrate that the very high efficiency of identification of protein mixtures that can be achieved by appropriate choices of informatics procedures is hampered by limitations of the mass spectrometric dynamic range. The results stress the desire to choose carefully experimental protocols for comprehensive proteome analysis, focusing on truly critical issues such as the dynamic range, which potentially limits the possibilities of identifying low abundance proteins.

An Evaluation, Comparison, and Accurate Benchmarking of Several Publicly Available MS/MS Search Algorithms: Sensitivity and Specificity Analysis

MS/MS and associated database search algorithms are essential proteomic tools for identifying peptides. Due to their widespread use, it is now time to perform a systematic analysis of the various algorithms currently in use. Using blood specimens used in the HUPO Plasma Proteome Project, we have evaluated five search algorithms with respect to their sensitivity and specificity, and have also accurately benchmarked them based on specified false-positive (FP) rates. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but were inferior to MASCOT, X!Tandem, and Sonar in terms of specificity. Overall, MASCOT, a probabilistic search algorithm, correctly identified most peptides based on a specified FP rate. The rescoring algorithm, PeptideProphet, enhanced the overall performance of the SEQUEST algorithm, as well as provided predictable FP error rates. Ideally, score thresholds should be calculated for each peptide spectrum or minimally, derived from a reversed-sequence search as demonstrated in this study based on a validated data set. The availability of open-source search algorithms, such as X!Tandem, makes it feasible to further improve the validation process (manual or automatic) on the basis of "consensus scoring", i.e., the use of multiple (at least two) search algorithms to reduce the number of FPs. complement.

Optimizing Search Conditions for the Mass Fingerprint-based Identification of Proteins

The two central problems in protein identification by searching a protein sequence collection with MS data are the optimal use of experimental information to allow for identification of low abundance proteins and the accurate assignment of the probability that a result is false. For comprehensive MS-based protein identification, it is necessary to choose an appropriate algorithm and optimal search conditions. We report a systematic study of the quality of PMF-based protein identifications under different sequence collection search conditions using the Probability algorithm, which assigns the statistical significance to each result. We employed 2244 PMFs from 2-DE-separated human blood plasma proteins, and performed identification under various search constraints: mass accuracy (0.01-0.3 Da), maximum number of missed cleavage sites (0-2), and size of the sequence collection searched (5.6 x 10(4)-1.8 x 10(5)). By counting the number of significant results (significance levels 0.05, 0.01, and 0.001) for each condition, we demonstrate the search condition impact on the successful outcome of proteome analysis experiments. A mass correction procedure utilizing mass deviations of albumin matching peptides was tested in an attempt to improve the statistical significance of identifications and iterative searching was employed for identification of multiple proteins from each PMF.

SwePep, a Database Designed for Endogenous Peptides and Mass Spectrometry

A new database, SwePep, specifically designed for endogenous peptides, has been constructed to significantly speed up the identification process from complex tissue samples utilizing mass spectrometry. In the identification process the experimental peptide masses are compared with the peptide masses stored in the database both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data. Successful applications of this methodology are presented. The SwePep database is a relational database developed using MySql and Java. The database contains 4180 annotated endogenous peptides from different tissues originating from 394 different species as well as 50 novel peptides from brain tissue identified in our laboratory. Information about the peptides, including mass, isoelectric point, sequence, and precursor protein, is also stored in the database. This new approach holds great potential for removing the bottleneck that occurs during the identification process in the field of peptidomics. The SwePep database is available to the public.

Reproducibility of LC-MS-based Protein Identification

Traditional analysis of liquid chromatography-mass spectrometry (LC-MS) data, typically performed by reviewing chromatograms and the corresponding mass spectra, is both time-consuming and difficult. Detailed data analysis is therefore often omitted in proteomics applications. When analysing multiple proteomics samples, it is usually only the final list of identified proteins that is reviewed. This may lead to unnecessarily complex or even contradictory results because the content of the list of identified proteins depends heavily on the conditions for triggering the collection of tandem mass spectra. Small changes in the signal intensity of a peptide in different LC-MS experiments can lead to the collection of a tandem mass spectrum in one experiment but not in another. Also, the quality of the tandem mass spectrometry experiments can vary, leading to successful identification in some cases but not in others. Using a novel image analysis approach, it is possible to achieve repeat analysis with a very high reproducibility by matching peptides across different LC-MS experiments using the retention time and parent mass over charge (m/z). It is also easy to confirm the final result visually. This approach has been investigated by using tryptic digests of integral membrane proteins from organelle-enriched fractions from Arabidopsis thaliana and it has been demonstrated that very highly reproducible, consistent, and reliable LC-MS data interpretation can be made.

Detection of Artifacts and Peptide Modifications in Liquid Chromatography/mass Spectrometry Data Using Two-dimensional Signal Intensity Map Data Visualization

We demonstrate how visualization of liquid chromatography/mass spectrometry data as a two-dimensional signal intensity map can be used to assess the overall quality of the data, for the identification of polymer contaminants and artifacts, as well as for the confirmation of post-translational modifications.

Determining the Overall Merit of Protein Identification Data Sets: Rho-diagrams and Rho-scores

This paper described a simple heuristic method for determining the merit of a set of peptide sequence assignments made using tandem mass spectra. The method involved comparing a prediction based on the known stochastic behavior of a sequence assignment algorithm with the assignments generated from a particular data set. A particular formulation of this comparison was defined through the construction of a plot of the data, the rho-diagram, as well as a parameter derived from this plot, the rho-score. This plot and parameter were shown to be able to readily characterize the relative quality of a set of peptide sequence assignments and to allow the straightforward determination of probability threshold values for the interpretation of proteomics data. This plot is independent of the algorithm or scoring scheme used to estimate the statistical significance of a set of experimental results; rather, it can be used as an objective test of the correctness of those estimates. The rho-score can also be used as a parameter to evaluate the relative merit of protein identifications, such as those made across proteome species taxonomic categories.

Neuropeptidomics Strategies for Specific and Sensitive Identification of Endogenous Peptides

A new approach using targeted sequence collections has been developed for identifying endogenous peptides. This approach enables a fast, specific, and sensitive identification of endogenous peptides. Three different sequence collections were constituted in this study to mimic the peptidomic samples: SwePep precursors, SwePep peptides, and SwePep predicted. The searches for neuropeptides performed against these three sequence collections were compared with searches performed against the entire mouse proteome, which is commonly used to identify neuropeptides. These four sequence collections were searched with both Mascot and X! Tandem. Evaluation of the sequence collections was achieved using a set of manually identified and previously verified peptides. By using the three new sequence collections, which more accurately mimic the sample, 3 times as many peptides were significantly identified, with a false-positive rate below 1%, in comparison with the mouse proteome. The new sequence collections were also used to identify previously uncharacterized peptides from brain tissue; 27 previously uncharacterized peptides and potentially bioactive neuropeptides were identified. These novel peptides are cleaved from the peptide precursors at sites that are characteristic for prohormone convertases, and some of them have post-translational modifications that are characteristic for neuropeptides. The targeted protein sequence collections for different species are publicly available for download from SwePep.

Improving the Success Rate of Proteome Analysis by Modeling Protein-abundance Distributions and Experimental Designs

Truly comprehensive proteome analysis is highly desirable in systems biology and biomarker discovery efforts. But complete proteome characterization has been hindered by the dynamic range and detection sensitivity of experimental designs, which are not adequate to the very wide range of protein abundances. Experimental designs for comprehensive analytical efforts involve separation followed by mass spectrometry-based identification of digested proteins. Because results are generally reported as a collection of identifications with no information on the fraction of the proteome that was missed, they are difficult to evaluate and potentially misleading. Here we address this problem by taking a holistic view of the experimental design and using computer simulations to estimate the success rate for any given experiment. Our approach demonstrates that simple changes in typical experimental designs can enhance the success rate of proteome analysis by five- to tenfold.

An Automated Method for Scanning LC-MS Data Sets for Significant Peptides and Proteins, Including Quantitative Profiling and Interactive Confirmation

Differential quantification of proteins and peptides by LC-MS is a promising method to acquire knowledge about biological processes, and for finding drug targets and biomarkers. However, differential protein analysis using LC-MS has been held back by the lack of suitable software tools. Large amounts of experimental data are easily generated in protein and peptide profiling experiments, but data analysis is time-consuming and labor-intensive. Here, we present a fully automated method for scanning LC-MS/MS data for biologically significant peptides and proteins, including support for interactive confirmation and further profiling. By studying peptide mixtures of known composition, we demonstrate that peptides present in different amounts in different groups of samples can be automatically screened for using statistical tests. A linear response can be obtained over almost 3 orders of magnitude, facilitating further profiling of peptides and proteins of interest. Furthermore, we apply the method to study the changes of endogenous peptide levels in mouse brain striatum after administration of reserpine, a classical model drug for inducing Parkinson disease symptoms.

Informatics Development: Challenges and Solutions for MALDI Mass Spectrometry

Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has been successfully applied to elucidating biological questions trough the analysis of proteins, peptides, and nucleic acids. Here, we review the different approaches for analyzing the data that is generated by MALDI-MS. The first step in the analysis is the processing of the raw data to find peaks that correspond to the analytes. The peaks are characterized by their areas (or heights) and their centroids. The peak area can be used as a measure of the quantity of the analyte, and the centroid can be used to determine the mass of the analyte. The masses are then compared to models of the analyte, and these models are ranked according to how well they fit the data and their significance is calculated. This allows the determination of the identity (sequence and modifications) of the analytes. We show how this general data analysis workflow is applied to protein and nucleic acid chemistry as well as proteomics.

Use of DNA Ladders for Reproducible Protein Fractionation by Sodium Dodecyl Sulfate-polyacrylamide Gel Electrophoresis (SDS-PAGE) for Quantitative Proteomics

In proteomics, one-dimensional (1D) sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) is widely used for protein fractionation prior to mass spectrometric analysis to enhance the dynamic range of analysis and to improve the identification of low-abundance proteins. Such protein prefractionation works well for quantitation strategies if the proteins are labeled prior to separation. However, because of the poor reproducibility of cutting gel slices, especially when small amounts of samples are analyzed, its application in label-free and peptide-labeling quantitative proteomics methods has been greatly limited. To overcome this limitation, we developed a new strategy in which a DNA ladder is mixed with the protein sample before PAGE separation. After PAGE separation, the DNA ladder is stained to allow for easy, precise, and reproducible gel cutting. To this end, a novel visible DNA-staining method was developed. This staining method is fast, sensitive, and compatible with mass spectrometry. To evaluate the reproducibility of DNA-ladder-assisted gel cutting for quantitative protein fractionation, we used stable isotope labeling with amino acids in cell culture (SILAC). Our results show that the quantitative error associated with fractionation can be minimized using the DNA-assisted fractionation and multiple replicates of gel cutting. In conclusion, 1D PAGE fractionation in combination with DNA ladders can be used for label-free comparative proteomics without compromising quantitation.

Rapid Isolation and Identification of Bacteriophage T4-encoded Modifications of Escherichia Coli RNA Polymerase: a Generic Method to Study Bacteriophage/host Interactions

Bacteriophages are bacterial viruses that infect bacterial cells, and they have developed ingenious mechanisms to modify the bacterial RNA polymerase. Using a rapid, specific, single-step affinity isolation procedure to purify Escherichia coli RNA polymerase from bacteriophage T4-infected cells, we have identified bacteriophage T4-dependent modifications of the host RNA polymerase. We suggest that this methodology is broadly applicable for the identification of bacteriophage-dependent alterations of the host synthesis machinery.

Efficient Identification of Phosphorylation by Mass Spectrometric Phosphopeptide Fingerprinting

We describe a rapid and efficient method for the identification of phosphopeptides, which we term mass spectrometric (MS) phosphopeptide fingerprinting. The method involves quantitative comparison of proteolytic peptides from native versus completely dephosphorylated proteins. Dephosphorylation of serine, threonine, and tyrosine residues is achieved by in-gel treatment of the separated proteins with hydrogen fluoride (HF). This chemical dephosphorylation results in enrichment of those unmodified peptides that correspond to previously phosphorylated peptides. Quantitative comparison of the signal-to-noise ratios of peaks in the treated versus untreated samples are used to identify phosphopeptides, which can be confirmed and further studied by tandem mass spectrometry (MS/MS). We have applied this method to identify eight known phosphorylation sites of Xenopus Aurora A kinase, as well as several novel sites in the Xenopus chromosome passenger complex (CPC).

Validation of Endogenous Peptide Identifications Using a Database of Tandem Mass Spectra

The SwePep database is designed for endogenous peptides and mass spectrometry. It contains information about the peptides such as mass, pl, precursor protein and potential post-translational modifications. Here, we have improved and extended the SwePep database with tandem mass spectra, by adding a locally curated version of the global proteome machine database (GPMDB). In peptidomic experiment practice, many peptide sequences contain multiple tandem mass spectra with different quality. The new tandem mass spectra database in SwePep enables validation of low quality spectra using high quality tandem mass spectra. The validation is performed by comparing the fragmentation patterns of the two spectra using algorithms for calculating the correlation coefficient between the spectra. The present study is the first step in developing a tandem spectrum database for endogenous peptides that can be used for spectrum-to-spectrum identifications instead of peptide identifications using traditional protein sequence database searches.

Screening for EphB Signaling Effectors Using SILAC with a Linear Ion Trap-orbitrap Mass Spectrometer

Erythropoietin-producing hepatocellular carcinoma (Eph) receptors play important roles in development, neural plasticity, and cancer. We used an Orbitrap mass spectrometer and stable isotope labeling by amino acids in cell culture (SILAC) to identify and quantify 204 proteins with significantly changed abundance in antiphosphotyrosine immunoprecipitates after ephrinB1-Fc stimulation. More than half of all known effectors downstream of EphB receptors were identified in this study, as well as numerous novel candidates for EphB signaling.

Evaluation of the Variation in Sample Preparation for Comparative Proteomics Using Stable Isotope Labeling by Amino Acids in Cell Culture

In comparative proteomic studies, it is important to know the variability associated with sample preparation. In this study, we report the strategy of using SILAC (stable isotope labeling by amino acids in cell culture) to evaluate the effect of the variation in sample preparation for quantitative proteomics. Variability can be measured when equal amounts of light and heavy SILAC samples undergo the same sample preparation procedures in parallel, and the two samples are mixed for relative protein quantitation by mass spectrometry. The high quantitative accuracy of SILAC allows for characterization of small variations. First, the reproducibility of immunoprecipitation (IP) and in-gel digestion was evaluated, and the impact of replicate number on quantitative accuracy was characterized. Second, we evaluated the overall variation in a comparative workflow involving three sequential sample preparation steps: IP, SDS-PAGE fractionation, and in-gel digestion. The evaluation of individual sample preparation steps was very valuable for experimental design: the optimal number of replicates for each step could be readily determined and the overall variation of the workflow could be predicted from the variation of the individual steps involved. By using informed experimental design, we demonstrated that the error associated with multiple steps of sample preparation in a comparative experiment can be limited to a reasonably low level.

Four Histone Variants Mark the Boundaries of Polycistronic Transcription Units in Trypanosoma Brucei

Unusually for a eukaryote, genes transcribed by RNA polymerase II (pol II) in Trypanosoma brucei are arranged in polycistronic transcription units. With one exception, no pol II promoter motifs have been identified, and how transcription is initiated remains an enigma. T. brucei has four histone variants: H2AZ, H2BV, H3V, and H4V. Using chromatin immunoprecipitation (ChIP) and sequencing (ChIP-seq) to examine the genome-wide distribution of chromatin components, we show that histones H4K10ac, H2AZ, H2BV, and the bromodomain factor BDF3 are enriched up to 300-fold at probable pol II transcription start sites (TSSs). We also show that nucleosomes containing H2AZ and H2BV are less stable than canonical nucleosomes. Our analysis also identifies >60 unexpected TSS candidates and reveals the presence of long guanine runs at probable TSSs. Apparently unique to trypanosomes, additional histone variants H3V and H4V are enriched at probable pol II transcription termination sites. Our findings suggest that histone modifications and histone variants play crucial roles in transcription initiation and termination in trypanosomes and that destabilization of nucleosomes by histone variants is an evolutionarily ancient and general mechanism of transcription initiation, demonstrated in an organism in which general pol II transcription factors have been elusive.

Rapid Sensitive Analysis of Cysteine Rich Peptide Venom Components

Disulfide-rich peptide venoms from animals such as snakes, spiders, scorpions, and certain marine snails represent one of nature's great diversity libraries of bioactive molecules. The various species of marine cone shells have alone been estimated to produce >50,000 distinct peptide venoms. These peptides have stimulated considerable interest because of their ability to potently alter the function of specific ion channels. To date, only a small fraction of this immense resource has been characterized because of the difficulty in elucidating their primary structures, which range in size between 10 and 80 aa, include up to 5 disulfide bonds, and can contain extensive posttranslational modifications. The extraordinary complexity of crude venoms and the lack of DNA databases for many of the organisms of interest present major analytical challenges. Here, we describe a strategy that uses mass spectrometry for the elucidation of the mature peptide toxin components of crude venom samples. Key to this strategy is our use of electron transfer dissociation (ETD), a mass spectrometric fragmentation technique that can produce sequence information across the entire peptide backbone. However, because ETD only yields comprehensive sequence coverage when the charge state of the precursor peptide ion is sufficiently high and the m/z ratio is low, we combined ETD with a targeted chemical derivatization strategy to increase the charge state of cysteine-containing peptide toxins. Using this strategy, we obtained full sequences for 31 peptide toxins, using just 7% of the crude venom from the venom gland of a single cone snail (Conus textile).

Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance

Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize pre-analytical and analytical variation in comparative proteomics experiments.

GINS Motion Reveals Replication Fork Progression is Remarkably Uniform Throughout the Yeast Genome

Previous studies have led to a picture wherein the replication of DNA progresses at variable rates over different parts of the budding yeast genome. These prior experiments, focused on production of nascent DNA, have been interpreted to imply that the dynamics of replication fork progression are strongly affected by local chromatin structure/architecture, and by interaction with machineries controlling transcription, repair and epigenetic maintenance. Here, we adopted a complementary approach for assaying replication dynamics using whole genome time-resolved chromatin immunoprecipitation combined with microarray analysis of the GINS complex, an integral member of the replication fork. Surprisingly, our data show that this complex progresses at highly uniform rates regardless of genomic location, revealing that replication fork dynamics in yeast is simpler and more uniform than previously envisaged. In addition, we show how the synergistic use of experiment and modeling leads to novel biological insights. In particular, a parsimonious model allowed us to accurately simulate fork movement throughout the genome and also revealed a subtle phenomenon, which we interpret as arising from low-frequency fork arrest.

The Asia Oceania Human Proteome Organisation Membrane Proteomics Initiative. Preparation and Characterisation of the Carbonate-washed Membrane Standard

The Asia Oceania Human Proteome Organisation (AOHUPO) has embarked on a Membrane Proteomics Initiative with goals of systematic comparison of strategies for analysis of membrane proteomes and discovery of membrane proteins. This multilaboratory project is based on the analysis of a subcellular fraction from mouse liver that contains endoplasmic reticulum and other organelles. In this study, we present the strategy used for the preparation and initial characterization of the membrane sample, including validation that the carbonate-washing step enriches for integral and lipid-anchored membrane proteins. Analysis of 17 independent data sets from five types of proteomic workflows is in progress.

Mass Spectrometric Protein Identification Using the Global Proteome Machine

Protein identification by mass spectrometry is widely used in biological research. Here, we describe how the global proteome machine (GPM) can be used for protein identification and for validation of the results. We cover identification by searching protein sequence collections and spectral libraries as well as validation of the results using expectation values, rho-diagrams, and spectrum databases.

Protein Quantitation Using Mass Spectrometry

Mass spectrometry is a method of choice for quantifying low-abundance proteins and peptides in many biological studies. Here, we describe a range of computational aspects of protein and peptide quantitation, including methods for finding and integrating mass spectrometric peptide peaks, and detecting interference to obtain a robust measure of the amount of proteins present in samples.

Modeling Experimental Design for Proteomics

The complexity of proteomes makes good experimental design essential for their successful investigation. Here, we describe how proteomics experiments can be modeled and how computer simulations of these models can be used to improve experimental designs.

Modeling Mass Spectrometry-based Protein Analysis

The success of mass spectrometry based proteomics depends on efficient methods for data analysis. These methods require a detailed understanding of the information value of the data. Here, we describe how the information value can be elucidated by performing simulations using synthetic data.

High-Capacity Ion Trap Coupled to a Time-of-Flight Mass Spectrometer for Comprehensive Linked Scans with No Scanning Losses

A high-capacity ion trap coupled to a time-of-flight (TOF) mass spectrometer has been developed to carry out comprehensive linked scan analysis of all stored ions in the ion trap. The approach involves a novel tapered geometry high-capacity ion trap that can store more than 10(6) ions (range 800-4000 m/z) without degrading its performance. Ions are stored and scanned out from the high-capacity ion trap as a function of m/z, collisionally fragmented and analyzed by TOF. Accurate mass analysis is achieved on both the precursor and fragment ions of all species ejected from the ion trap. We demonstrate the approach for comprehensive linked-scan identification of phosphopeptides in mixtures with their corresponding unphosphorylated peptides.

Sequence and Structural Convergence of Broad and Potent HIV Antibodies That Mimic CD4 Binding

Passive transfer of broadly neutralizing HIV antibodies can prevent infection, which suggests that vaccines that elicit such antibodies would be protective. Thus far, however, few broadly neutralizing HIV antibodies that occur naturally have been characterized. To determine whether these antibodies are part of a larger group of related molecules, we cloned 576 new HIV antibodies from four unrelated individuals. All four individuals produced expanded clones of potent broadly neutralizing CD4-binding-site antibodies that mimic binding to CD4. Despite extensive hypermutation, the new antibodies shared a consensus sequence of 68 immunoglobulin H (IgH) chain amino acids and arise independently from two related IgH genes. Comparison of the crystal structure of one of the antibodies to the broadly neutralizing antibody VRC01 revealed conservation of the contacts to the HIV spike.

Waiting
simple hit counter