Proteogenomics combines large-scale genomic and transcriptomic data with mass-spectrometry-based proteomic data to discover novel protein sequence variants and improve genome annotation. In contrast with conventional proteomic applications, proteogenomic analysis requires a number of additional data processing steps. Ideally, these required steps would be integrated and automated via a single software platform offering accessibility for wet-bench researchers as well as flexibility for user-specific customization and integration of new software tools as they emerge. Toward this end, we have extended the Galaxy bioinformatics framework to facilitate proteogenomic analysis. Using analysis of whole human saliva as an example, we demonstrate Galaxy's flexibility through the creation of a modular workflow incorporating both established and customized software tools that improve depth and quality of proteogenomic results. Our customized Galaxy-based software includes automated, batch-mode BLASTP searching and a Peptide Sequence Match Evaluator tool, both useful for evaluating the veracity of putative novel peptide identifications. Our complex workflow (approximately 140 steps) can be easily shared using built-in Galaxy functions, enabling their use and customization by others. Our results provide a blueprint for the establishment of the Galaxy framework as an ideal solution for the emerging field of proteogenomics.
To sensitively analyze complex protein mixtures by mass spectrometry-based shotgun proteomics, researchers have employed platforms that couple orthogonal peptide fractionation methods using nanoscale HPLC. Commonly used platforms have coupled either strong cation exchange (SCX) HPLC or preparative isoelectric focusing (IEF) with nanoscale reversed-phase (nanoRP) HPLC fractionation of peptides. Coupling two dimensions of peptide fractionation, prior to mass spectrometric analysis, increases the sensitivity for identifying low abundance proteins. However, the large dynamic range of protein abundance and high level of complexity of protein mixtures derived from many biological sources, such as bodily fluids, require additional steps of peptide fractionation. To address this shortcoming, we have developed a platform combining three dimensions of peptide fractionation as follows: (1) preparative IEF; (2) SCX HPLC; and (3) nanoRP HPLC. This platform significantly increases the sensitivity of shotgun proteomic analysis in complex protein mixtures. Here, we describe the implementation of this three-dimensional peptide fractionation platform for proteomic studies of complex mixtures.
In-depth knowledge of bodily fluid phosphoproteomes, such as whole saliva, is limited. To better understand the whole saliva phosphoproteome, we generated a large-scale catalog of phosphorylated proteins. To circumvent the wide dynamic range of phosphoprotein abundance in whole saliva, we combined dynamic range compression using hexapeptide beads, strong cation exchange HPLC peptide fractionation, and immobilized metal affinity chromatography prior to mass spectrometry. In total, 217 unique phosphopeptides sites were identified representing 85 distinct phosphoproteins at 2.3% global FDR. From these peptides, 129 distinct phosphorylation sites were identified of which 57 were previously known, but only 11 of which had been previously identified in whole saliva. Cellular localization analysis revealed salivary phosphoproteins had a distribution similar to all known salivary proteins, but with less relative representation in "extracellular" and "plasma membrane" categories compared to salivary glycoproteins. Sequence alignment showed that phosphorylation occurred at acidic-directed kinase, proline-directed, and basophilic motifs. This differs from plasma phosphoproteins, which predominantly occur at Golgi casein kinase recognized sequences. Collectively, these results suggest diverse functions for salivary phosphoproteins and multiple kinases involved in their processing and secretion. In all, this study should lay groundwork for future elucidation of the functions of salivary protein phosphorylation.
Dynamic range compression (DRC) by hexapeptide libraries increases MS/MS-based identification of lower-abundance proteins in complex mixtures. However, two unanswered questions impede fully realizing DRCs potential in shotgun proteomics. First, does DRC enhance identification of post-translationally modified proteins? Second, can DRC be incorporated into a workflow enabling relative protein abundance profiling? We sought to answer both questions analyzing human whole saliva. Addressing question one, we coupled DRC with covalent glycopeptide enrichment and MS/MS. With DRC we identified ?2 times more N-linked glycoproteins and their glycosylation sites than without DRC, dramatically increasing the known salivary glycoprotein catalog. Addressing question two, we compared differentially stable isotope-labeled saliva samples pooled from healthy and metastatic breast cancer women using a multidimensional peptide fractionation-based workflow, analyzing in parallel one sample portion with DRC and one portion without. Our workflow categorizes proteins with higher absolute abundance, whose relative abundance ratios are altered by DRC, from proteins of lower absolute abundance detected only after DRC. Within each of these salivary protein categories, we identified novel abundance changes putatively associated with breast cancer, demonstrating feasibility and benefits of DRC for relative abundance profiling. Collectively, our results bring us closer to realizing the full potential of DRC for proteomic studies.
INTRODUCTION: Tumors lack normal drainage of secreted fluids and consequently build up tumor interstitial fluid (TIF). Unlike other bodily fluids, TIF likely contains a high proportion of tumor-specific proteins with potential as biomarkers. METHODS: Here, we evaluated a novel technique using a unique ultrafiltration catheter for in situ collection of TIF and used it to generate the first catalog of TIF proteins from a head and neck squamous cell carcinoma (HNSCC). To maximize proteomic coverage, TIF was immunodepleted for high abundance proteins and digested with trypsin, and peptides were fractionated in three dimensions prior to mass spectrometry. RESULTS: We identified 525 proteins with high confidence. The HNSCC TIF proteome was distinct compared to proteomes of other bodily fluids. It contained a relatively high proportion of proteins annotated by Gene Ontology as "extracellular" compared to other secreted fluid and cellular proteomes, indicating minimal cell lysis from our in situ collection technique. Several proteins identified are putative biomarkers of HNSCC, supporting our catalogs value as a source of potential biomarkers. CONCLUSIONS: In all, we demonstrate a reliable new technique for in situ TIF collection and provide the first HNSCC TIF protein catalog with value as a guide for others seeking to develop tumor biomarkers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12014-010-9050-3) contains supplementary material, which is available to authorized users.
Cellular nutritional and energy status regulates a wide range of nuclear processes important for cell growth, survival, and metabolic homeostasis. Mammalian target of rapamycin (mTOR) plays a key role in the cellular responses to nutrients. However, the nuclear processes governed by mTOR have not been clearly defined. Using isobaric peptide tagging coupled with linear ion trap mass spectrometry, we performed quantitative proteomics analysis to identify nuclear processes in human cells under control of mTOR. Within 3 h of inhibiting mTOR with rapamycin in HeLa cells, we observed down-regulation of nuclear abundance of many proteins involved in translation and RNA modification. Unexpectedly, mTOR inhibition also down-regulated several proteins functioning in chromosomal integrity and up-regulated those involved in DNA damage responses (DDRs) such as 53BP1. Consistent with these proteomic changes and DDR activation, mTOR inhibition enhanced interaction between 53BP1 and p53 and increased phosphorylation of ataxia telangiectasia mutated (ATM) kinase substrates. ATM substrate phosphorylation was also induced by inhibiting protein synthesis and suppressed by inhibiting proteasomal activity, suggesting that mTOR inhibition reduces steady-state (abundance) levels of proteins that function in cellular pathways of DDR activation. Finally, rapamycin-induced changes led to increased survival after radiation exposure in HeLa cells. These findings reveal a novel functional link between mTOR and DDR pathways in the nucleus potentially operating as a survival mechanism against unfavorable growth conditions.
Comprehensive identification of proteins in whole human saliva is critical for appreciating its full diagnostic potential. However, this is challenged by the large dynamic range of protein abundance within the fluid. To address this problem, we used an analysis platform that coupled hexapeptide libraries for dynamic range compression (DRC) with three-dimensional (3D) peptide fractionation. Our approach identified 2340 proteins in whole saliva and represents the largest saliva proteomic dataset generated using a single analysis platform. Three-dimensional peptide fractionation involving sequential steps of preparative isoelectric focusing (IEF), strong cation exchange, and capillary reversed-phase liquid chromatography was essential for maximizing gains from DRC. Compared to saliva not treated with hexapeptide libraries, DRC substantially increased identified proteins across physicochemical and functional categories. Approximately 20% of total salivary proteins are also seen in plasma, and proteins in both fluids show comparable functional diversity and disease-linkage. However, for a subset of diseases, saliva has higher apparent diagnostic potential. These results expand the potential for whole saliva in health monitoring/diagnostics and provide a general platform for improving proteomic coverage of complex biological samples.
LTQ Orbitrap data analyzed with ProteinPilot can be further improved by MaxQuant raw data processing, which utilizes precursor-level high mass accuracy data for peak processing and MGF creation. In particular, ProteinPilot results from MaxQuant-processed peaklists for Orbitrap data sets resulted in improved spectral utilization due to an improved peaklist quality with higher precision and high precursor mass accuracy (HPMA). The output and postsearch analysis tools of both workflows were utilized for previously unexplored features of a three-dimensional fractionated and hexapeptide library (ProteoMiner) treated whole saliva data set comprising 200 fractions. ProteinPilots ability to simultaneously predict multiple modifications showed an advantage from ProteoMiner treatment for modified peptide identification. We demonstrate that complementary approaches in the analysis pipeline provide comprehensive results for the whole saliva data set acquired on an LTQ Orbitrap. Overall our results establish a workflow for improved protein identification from high mass accuracy data.
The human salivary proteome is extremely complex, including proteins from salivary glands, serum, and oral microbes. Much has been learned about the host component, but little is known about the microbial component. Here we report a metaproteomic analysis of salivary supernatant pooled from six healthy subjects. For deep interrogation of the salivary proteome, we combined protein dynamic range compression (DRC), multidimensional peptide fractionation, and high-mass accuracy MS/MS with a novel two-step peptide identification method using a database of human proteins plus those translated from oral microbe genomes. Peptides were identified from 124 microbial species as well as uncultured phylotypes such as TM7. Streptococcus, Rothia, Actinomyces, Prevotella, Neisseria, Veilonella, Lactobacillus, Selenomonas, Pseudomonas, Staphylococcus, and Campylobacter were abundant among the 65 genera from 12 phyla represented. Taxonomic diversity in our study was broadly consistent with metagenomic studies of saliva. Proteins mapped to 20 KEGG pathways, with carbohydrate metabolism, amino acid metabolism, energy metabolism, translation, membrane transport, and signal transduction most represented. The communities sampled appear to be actively engaged in glycolysis and protein synthesis. This first deep metaproteomic catalog from human salivary supernatant provides a baseline for future studies of shifts in microbial diversity and protein activities potentially associated with oral disease.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.