Local ablative techniques such as selective internal radiation therapy (SIRT) have become the mainstay of treating hepatocellular carcinoma (HCC) in the bridging-to-transplant and palliative setting. We recently demonstrated that epithelial circulating tumor cells (CTCs) correlate to an unfavorable outcome. We wanted to scrutinize whether molecular markers detected in this specific CTC subgroup may also have clinical implications.
The aim of this study was the identification of novel biomarker candidates for the diagnosis of cholangiocellular carcinoma (CCC) and its immunohistochemical differentiation from benign liver and bile duct cells. CCC is a primary cancer that arises from the epithelial cells of bile ducts and is characterized by high mortality rates due to its late clinical presentation and limited treatment options. Tumorous tissue and adjacent non-tumorous liver tissue from eight CCC patients were analyzed by means of two-dimensional differential in-gel electrophoresis and mass-spectrometry-based label-free proteomics. After data analysis and statistical evaluation of the proteins found to be differentially regulated between the two experimental groups (fold change ? 1.5; p value ? 0.05), 14 candidate proteins were chosen for determination of the cell-type-specific expression profile via immunohistochemistry in a cohort of 14 patients. This confirmed the significant up-regulation of serpin H1, 14-3-3 protein sigma, and stress-induced phosphoprotein 1 in tumorous cholangiocytes relative to normal hepatocytes and non-tumorous cholangiocytes, whereas some proteins were detectable specifically in hepatocytes. Because stress-induced phosphoprotein 1 exhibited both sensitivity and specificity of 100%, an immunohistochemical verification examining tissue sections of 60 CCC patients was performed. This resulted in a specificity of 98% and a sensitivity of 64%. We therefore conclude that this protein should be considered as a potential diagnostic biomarker for CCC in an immunohistochemical application, possibly in combination with other candidates from this study in the form of a biomarker panel. This could improve the differential diagnosis of CCC and benign bile duct diseases, as well as metastatic malignancies in the liver.
The intestinal peptide transporter PEPT-1 plays an important role in development, growth, reproduction, and stress tolerance in Caenorhabditis elegans, as revealed by the severe phenotype of the pept-1-deficient strain. The reduced number of offspring and increased stress resistance were shown to result from changes in the insulin/IGF-signaling cascade. To further elucidate the regulatory network behind the phenotypic alterations in PEPT1-deficient animals, a quantitative proteome analysis combined with transcriptome profiling was applied. Various target genes of XBP-1, the major mediator of the unfolded protein response, were found to be downregulated at the mRNA and protein levels, accompanied by a reduction of spliced xbp-1 mRNA. Proteome analysis also revealed a markedly reduced content of numerous ribosomal proteins. This was associated with a reduction in the protein synthesis rate in pept-1 C. elegans, a process that is strictly regulated by the TOR (target of rapamycine) complex, the cellular sensor for free amino acids. These data argue for a central role of PEPT-1 in cellular amino acid homeostasis. In PEPT-1 deficiency, amino acid levels dropped systematically, leading to alterations in protein synthesis and in the IRE-1/XBP-1 pathway.
The Baculoviral IAP repeat-containing protein 5 (BIRC5), also known as inhibitor of apoptosis protein survivin, is a member of the chromosomal passenger complex and a key player in mitosis. To investigate the function of BIRC5 in liver regeneration, we analyzed a hepatocyte-specific BIRC5-knockout mouse model using a quantitative label-free proteomics approach. Here, we present the analyses of the proteome changes in hepatocyte-specific BIRC5-knockout mice compared to wildtype mice, as well as proteome changes during liver regeneration induced by partial hepatectomy in wildtype mice and mice lacking hepatic BIRC5, respectively. The BIRC5-knockout mice showed an extensive overexpression of proteins related to cellular maintenance, organization and protein synthesis. Key regulators of cell growth, transcription and translation MTOR and STAT1/STAT2 were found to be overexpressed. During liver regeneration proteome changes representing a response to the mitotic stimulus were detected in wildtype mice. Mainly proteins corresponding to proliferation, cell cycle and cytokinesis were up-regulated. The hepatocyte-specific BIRC5-knockout mice showed impaired liver regeneration, which had severe consequences on the proteome level. However, several proteins with function in mitosis were found to be up-regulated upon the proliferative stimulus. Our results show that the E3 ubiquitin-protein ligase UHRF1 is strongly up-regulated during liver regeneration independently of BIRC5.
Platelets (PLTs) in stored PLT concentrates (PLCs) release PLT extracellular vesicles (PL-EVs) induced by senescence and activation, resembling the PLT storage lesion. No comprehensive classification or molecular characterization of senescence-induced PL-EVs exists to understand PL-EV heterogeneity.
In the nasal cavity, the nonmotile cilium of olfactory sensory neurons (OSNs) constitutes the chemosensory interface between the ambient environment and the brain. The unique sensory organelle facilitates odor detection for which it includes all necessary components of initial and downstream olfactory signal transduction. In addition to its function in olfaction, a more universal role in modulating different signaling pathways is implicated, for example, in neurogenesis, apoptosis, and neural regeneration. To further extend our knowledge about this multifunctional signaling organelle, it is of high importance to establish a most detailed proteome map of the ciliary membrane compartment down to the level of transmembrane receptors. We detached cilia from mouse olfactory epithelia via Ca(2+)/K(+) shock followed by the enrichment of ciliary membrane proteins at alkaline pH, and we identified a total of 4,403 proteins by gel-based and gel-free methods in conjunction with high resolution LC/MS. This study is the first to report the detection of 62 native olfactory receptor proteins and to provide evidence for their heterogeneous expression at the protein level. Quantitative data evaluation revealed four ciliary membrane-associated candidate proteins (the annexins ANXA1, ANXA2, ANXA5, and S100A5) with a suggested function in the regulation of olfactory signal transduction, and their presence in ciliary structures was confirmed by immunohistochemistry. Moreover, we corroborated the ciliary localization of the potassium-dependent Na(+)/Ca(2+) exchanger (NCKX) 4 and the plasma membrane Ca(2+)-ATPase 1 (PMCA1) involved in olfactory signal termination, and we detected for the first time NCKX2 in olfactory cilia. Through comparison with transcriptome data specific for mature, ciliated OSNs, we finally delineated the membrane ciliome of OSNs. The membrane proteome of olfactory cilia established here is the most complete today, thus allowing us to pave new avenues for the study of diverse molecular functions and signaling pathways in and out of olfactory cilia and thus to advance our understanding of the biology of sensory organelles in general.
Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories such as the PRoteomics IDEntifications (PRIDE) database. Here we present a framework for reporting protein identifications that seeks to improve capabilities for comparing results generated by different inference tools. This framework standardizes the terminology for describing protein identification results, associated with the HUPO-Proteomics Standards Initiative (PSI) mzIdentML standard, while still allowing for differing methodologies to reach that final state. It is proposed that developers of software for reporting identification results will adopt this terminology in their outputs. While the new terminology does not require any changes to the core mzIdentML model, it represents a significant change in practice, and, as such, the rules will be released via a new version of the mzIdentML specification (version 1.2) so that consumers of files are able to determine whether the new guidelines have been adopted by export software.
In recent years, knowledge about immune-related disorders has substantially increased, especially in the field of central nervous system (CNS) disorders. Recent innovations in protein-related microarray technology have enabled the analysis of interactions between numerous samples and up to 20,000 targets. Antibodies directed against ion channels, receptors and other synaptic proteins have been identified, and their causative roles in different disorders have been identified. Knowledge about immunological disorders is likely to expand further as more antibody targets are discovered. Therefore, protein microarrays may become an established tool for routine diagnostic procedures in the future. The identification of relevant target proteins requires the development of new strategies to handle and process vast quantities of data so that these data can be evaluated and correlated with relevant clinical issues, such as disease progression, clinical manifestations and prognostic factors. This review will mainly focus on new protein array technologies, which allow the processing of a large number of samples, and their various applications with a deeper insight into their potential use as diagnostic tools in neurodegenerative diseases and other diseases. This article is part of a Special Issue entitled: Biomarkers: A Proteomic Challenge.
Amyotrophic lateral sclerosis (ALS), the most common adult-onset motor neuron disorder, is characterized by the progressive and selective loss of upper and lower motor neurons. Diagnosis of this disorder is based on clinical assessment, and the average survival time is less than 3 years. Injections of IgG from ALS patients into mice are known to specifically mark motor neurons. Moreover, IgG has been found in upper and lower motor neurons in ALS patients. These results led us to perform a case-control study using human protein microarrays to identify the antibody profiles of serum samples from 20 ALS patients and 20 healthy controls. We demonstrated high levels of 20 IgG antibodies that distinguished the patients from the controls. These findings suggest that a panel of antibodies may serve as a potential diagnostic biomarker for ALS.
FE65 is a cytosolic adapter protein and an important binding partner of the amyloid precursor protein (APP). Dependent on Thr668 phosphorylation in APP, which influences amyloidogenic APP processing, FE65 undergoes nuclear translocation, thereby transmitting a signal from the cell membrane to the nucleus. As this translocation may be relevant in Alzheimer disease (AD) and as FE65 consists of three protein-protein interaction domains able to bind and affect a variety of other proteins and down-stream signaling pathways, the identification of the FE65 interactome is of central interest in AD research. In this study, we identified 123 proteins as new potential FE65 interacting proteins in a pulldown/mass spectrometry approach using human post-mortem brain samples as protein pools for recombinantly expressed FE65. Co-immunoprecipitation assays further validated the interaction of FE65 with the candidates SV2A and SERCA2. In parallel, we investigated the whole cell proteome of primary hippocampal neurons from FE65/FE65L1 double knockout mice. Notably, the validated FE65-binding proteins were also found to be differentially abundant in neurons derived from the FE65 knockout mice when compared to wild type control neurons. SERCA2 is an important player in cellular calcium homeostasis, which was found to be upregulated in DKO neurons. Indeed, knock-down of FE65 in HEK293T cells also evoked an elevated sensitivity to thapsigargin, a stressor specifically targeting the activity of SERCA2. Thus, our results suggest that FE65 is involved in the regulation of the intracellular calcium homeostasis. While transfection of FE65 alone caused a typical dot-like phenotype in the nucleus, co-transfection of SV2A significantly reduced the percentage of FE65 dot-positive cells, pointing to a possible role for SV2A in the modulation of FE65 intracellular targeting. Given that SV2A has a signaling function at the presynapse, its effect on FE65 intracellular localization suggests that the SV2A/FE65 interaction might play a role in synaptic signal transduction.
Circulating tumor cells (CTCs) have been proposed as a monitoring tool in patients with solid tumors. So far, automated approaches are challenged by the cellular heterogeneity of CTC, especially the epithelial-mesenchymal transition. Recently, Yu and colleagues showed that shifts in these cell populations correlated with response and progression, respectively, to chemotherapy in patients with breast cancer. In this study, we assessed which non-hematopoietic cell types were identifiable in the peripheral blood of hepatocellular carcinoma (HCC) patients and whether their distribution during treatment courses is associated with clinical characteristics.
Peroxynitrite is a highly reactive chemical species with antibacterial properties that are synthesized in immune cells. In a proteomic approach, we identified specific target proteins of peroxynitrite-induced modifications in Escherichia coli. Although peroxynitrite caused a fairly indiscriminate nitration of tyrosine residues, reversible modifications of protein thiols were highly specific. We used a quantitative redox proteomic method based on isotope-coded affinity tag chemistry and identified four proteins consistently thiol-modified in cells treated with peroxynitrite as follows: AsnB, FrmA, MaeB, and RidA. All four were required for peroxynitrite stress tolerance in vivo. Three of the identified proteins were modified at highly conserved cysteines, and MaeB and FrmA are known to be directly involved in the oxidative and nitrosative stress response in E. coli. In in vitro studies, we could show that the activity of RidA, a recently discovered enamine/imine deaminase, is regulated in a specific manner by the modification of its single conserved cysteine. Mutation of this cysteine 107 to serine generated a constitutively active protein that was not susceptible to peroxynitrite.
Within the past decade numerous methods for quantitative proteome analysis have been developed of which all exhibit particular advantages and disadvantages. Here, we present the results of a study aiming for a comprehensive comparison of ion-intensity based label-free proteomics and two label-based approaches using isobaric tags incorporated at the peptide and protein levels, respectively. As model system for our quantitative analysis we used the three hepatoma cell lines HepG2, Hep3B and SK-Hep-1. Four biological replicates of each cell line were quantitatively analyzed using an RPLC-MS/MS setup. Each quantification experiment was performed twice to determine technical variances of the different quantification techniques. We were able to show that the label-free approach by far outperforms both TMT methods regarding proteome coverage, as up to threefold more proteins were reproducibly identified in replicate measurements. Furthermore, we could demonstrate that all three methods show comparable reproducibility concerning protein quantification, but slightly differ in terms of accuracy. Here, label-free was found to be less accurate than both TMT approaches. It was also observed that the introduction of TMT labels at the protein level reduces the effect of underestimation of protein ratios, which is commonly monitored in case of TMT peptide labeling. Previously reported differences in protein expression between the particular cell lines were furthermore reproduced, which confirms the applicability of each investigated quantification method to study proteomic differences in such biological systems. This article is part of a Special Issue entitled: Biomarkers: A Proteomic Challenge.
Proteomics methods, especially high-throughput mass spectrometry analysis have been continually developed and improved over the years. The analysis of complex biological samples produces large volumes of raw data. Data storage and recovery management pose substantial challenges to biomedical or proteomic facilities regarding backup and archiving concepts as well as hardware requirements. In this article we describe differences between the terms backup and archive with regard to manual and automatic approaches. We also introduce different storage concepts and technologies from transportable media to professional solutions such as redundant array of independent disks (RAID) systems, network attached storages (NAS) and storage area network (SAN). Moreover, we present a software solution, which we developed for the purpose of long-term preservation of large mass spectrometry raw data files on an object storage device (OSD) archiving system. Finally, advantages, disadvantages, and experiences from routine operations of the presented concepts and technologies are evaluated and discussed. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
The asparaginyl hydroxylase factor inhibiting HIF-1 (FIH-1) is an important suppressor of hypoxia-inducible factor (HIF) activity. In addition to HIF-?, FIH-1 was previously shown to hydroxylate other substrates within a highly conserved protein interaction domain, termed the ankyrin repeat domain (ARD). However, to date, the biological role of FIH-1-dependent ARD hydroxylation could not be clarified for any ARD-containing substrate. The apoptosis-stimulating p53-binding protein (ASPP) family members were initially identified as highly conserved regulators of the tumour suppressor p53. In addition, ASPP2 was shown to be important for the regulation of cell polarity through interaction with partitioning defective 3 homolog (Par-3). Using mass spectrometry we identified ASPP2 as a new substrate of FIH-1 but inhibitory ASPP (iASPP) was not hydroxylated. We demonstrated that ASPP2 asparagine 986 (N986) is a single hydroxylation site located within the ARD. ASPP2 protein levels and stability were not affected by depletion or inhibition of FIH-1. However, FIH-1 depletion did lead to impaired binding of Par-3 to ASPP2 while the interaction between ASPP2 and p53, apoptosis and proliferation of the cancer cells were not affected. Depletion of FIH-1 and incubation with the hydroxylase inhibitor dimethyloxalylglycine (DMOG) resulted in relocation of ASPP2 from cell-cell contacts to the cytosol. Our data thus demonstrate that protein interactions of ARD-containing substrates can be modified by FIH-1-dependent hydroxylation. The large cellular pool of ARD-containing proteins suggests that FIH-1 can affect a broad range of cellular functions and signalling pathways under certain conditions, for example, in response to severe hypoxia.
The range of heterogeneous approaches available for quantifying protein abundance via mass spectrometry (MS)(1) leads to considerable challenges in modeling, archiving, exchanging, or submitting experimental data sets as supplemental material to journals. To date, there has been no widely accepted format for capturing the evidence trail of how quantitative analysis has been performed by software, for transferring data between software packages, or for submitting to public databases. In the context of the Proteomics Standards Initiative, we have developed the mzQuantML data standard. The standard can represent quantitative data about regions in two-dimensional retention time versus mass/charge space (called features), peptides, and proteins and protein groups (where there is ambiguity regarding peptide-to-protein inference), and it offers limited support for small molecule (metabolomic) data. The format has structures for representing replicate MS runs, grouping of replicates (for example, as study variables), and capturing the parameters used by software packages to arrive at these values. The format has the capability to reference other standards such as mzML and mzIdentML, and thus the evidence trail for the MS workflow as a whole can now be described. Several software implementations are available, and we encourage other bioinformatics groups to use mzQuantML as an input, internal, or output format for quantitative software and for structuring local repositories. All project resources are available in the public domain from the HUPO Proteomics Standards Initiative http://www.psidev.info/mzquantml.
The intracellular domain of the amyloid precursor protein (AICD) is generated following cleavage of the precursor by the ?-secretase complex and is involved in membrane to nucleus signaling, for which the binding of AICD to the adapter protein FE65 is essential. Here we show that FE65 knockdown causes a downregulation of the protein Bloom syndrome protein (BLM) and the minichromosome maintenance (MCM) protein family and that elevated nuclear levels of FE65 result in stabilization of the BLM protein in nuclear mobile spheres. These spheres are able to grow and fuse, and potentially correspond to the nuclear domain 10. BLM plays a role in DNA replication and repair mechanisms and FE65 was also shown to play a role in DNA damage response in the cell. A set of proliferation assays in our work revealed that FE65 knockdown in HEK293T cells reduced cell replication. On the basis of these results, we hypothesize that nuclear FE65 levels (nuclear FE65/BLM containing spheres) may regulate cell cycle re-entry in neurons as a result of increased interaction of FE65 with BLM and/or an increase in MCM protein levels. Thus, FE65 interactions with BLM and MCM proteins may contribute to the neuronal cell cycle re-entry observed in brains affected by Alzheimers disease.
Proteomics-based clinical studies have been shown to be promising strategies for the discovery of novel biomarkers of a particular disease. Here, we present a study of hepatocellular carcinoma (HCC) that combines complementary two-dimensional difference in gel electrophoresis (2D-DIGE) and liquid chromatography (LC-MS)-based approaches of quantitative proteomics. In our proteomic experiments, we analyzed a set of 14 samples (7 × HCC versus 7 × nontumorous liver tissue) with both techniques. Thereby we identified 573 proteins that were differentially expressed between the experimental groups. Among these, only 51 differentially expressed proteins were identified irrespective of the applied approach. Using Western blotting and immunohistochemical analysis the regulation patterns of six selected proteins from the study overlap (inorganic pyrophosphatase 1 (PPA1), tumor necrosis factor type 1 receptor-associated protein 1 (TRAP1), betaine-homocysteine S-methyltransferase 1 (BHMT)) were successfully verified within the same sample set. In addition, the up-regulations of selected proteins from the complements of both approaches (major vault protein (MVP), gelsolin (GSN), chloride intracellular channel protein 1 (CLIC1)) were also reproducible. Within a second independent verification set (n = 33) the altered protein expression levels of major vault protein and betaine-homocysteine S-methyltransferase were further confirmed by Western blots quantitatively analyzed via densitometry. For the other candidates slight but nonsignificant trends were detectable in this independent cohort. Based on these results we assume that major vault protein and betaine-homocysteine S-methyltransferase have the potential to act as diagnostic HCC biomarker candidates that are worth to be followed in further validation studies.
Contemporary protein microarrays such as the ProtoArray® are used for autoimmune antibody screening studies to discover biomarker panels. For ProtoArray data analysis, the software Prospector and a default workflow are suggested by the manufacturer. While analyzing a large data set of a discovery study for diagnostic biomarkers of the Parkinsons disease (ParkCHIP), we have revealed the need for distinct improvements of the suggested workflow concerning raw data acquisition, normalization and preselection method availability, batch effects, feature selection, and feature validation. In this work, appropriate improvements of the default workflow are proposed. It is shown that completely automatic data acquisition as a batch, a re-implementation of Prospectors pre-selection method, multivariate or hybrid feature selection, and validation of the selected protein panel using an independent test set define in combination an improved workflow for large studies.
Mass spectrometry is already a well-established protein identification tool and recent methodological and technological developments have also made possible the extraction of quantitative data of protein abundance in large-scale studies. Several strategies for absolute and relative quantitative proteomics and the statistical assessment of quantifications are possible, each having specific measurements and therefore, different data analysis workflows. The guidelines for Mass Spectrometry Quantification allow the description of a wide range of quantitative approaches, including labeled and label-free techniques and also targeted approaches such as Selected Reaction Monitoring (SRM).
Multi-OMICS approaches aim on the integration of quantitative data obtained for different biological molecules in order to understand their interrelation and the functioning of larger systems. This paper deals with several data integration and data processing issues that frequently occur within this context. To this end, the data processing workflow within the PROFILE project is presented, a multi-OMICS project that aims on identification of novel biomarkers and the development of new therapeutic targets for seven important liver diseases. Furthermore, a software called CrossPlatformCommander is sketched, which facilitates several steps of the proposed workflow in a semi-automatic manner. Application of the software is presented for the detection of novel biomarkers, their ranking and annotation with existing knowledge using the example of corresponding Transcriptomics and Proteomics data sets obtained from patients suffering from hepatocellular carcinoma. Additionally, a linear regression analysis of Transcriptomics vs. Proteomics data is presented and its performance assessed. It was shown, that for capturing profound relations between Transcriptomics and Proteomics data, a simple linear regression analysis is not sufficient and implementation and evaluation of alternative statistical approaches are needed. Additionally, the integration of multivariate variable selection and classification approaches is intended for further development of the software. Although this paper focuses only on the combination of data obtained from quantitative Proteomics and Transcriptomics experiments, several approaches and data integration steps are also applicable for other OMICS technologies. Keeping specific restrictions in mind the suggested workflow (or at least parts of it) may be used as a template for similar projects that make use of different high throughput techniques. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
This paper focuses on the use of controlled vocabularies (CVs) and ontologies especially in the area of proteomics, primarily related to the work of the Proteomics Standards Initiative (PSI). It describes the relevant proteomics standard formats and the ontologies used within them. Software and tools for working with these ontology files are also discussed. The article also examines the "mapping files" used to ensure correct controlled vocabulary terms that are placed within PSI standards and the fulfillment of the MIAPE (Minimum Information about a Proteomics Experiment) requirements. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
The peroxisome is a key organelle of low abundance that fulfils various functions essential for human cell metabolism. Severe genetic diseases in humans are caused by defects in peroxisome biogenesis or deficiencies in the function of single peroxisomal proteins. To improve our knowledge of this important cellular structure, we studied for the first time human liver peroxisomes by quantitative proteomics. Peroxisomes were isolated by differential and Nycodenz density gradient centrifugation. A label-free quantitative study of 314 proteins across the density gradient was accomplished using high resolution mass spectrometry. By pairing statistical data evaluation, cDNA cloning and in vivo colocalization studies, we report the association of five new proteins with human liver peroxisomes. Among these, isochorismatase domain containing 1 protein points to the existence of a new metabolic pathway and hydroxysteroid dehydrogenase like 2 protein is likely involved in the transport or ?-oxidation of fatty acids in human peroxisomes. The detection of alcohol dehydrogenase 1A suggests the presence of an alternative alcohol-oxidizing system in hepatic peroxisomes. In addition, lactate dehydrogenase A and malate dehydrogenase 1 partially associate with human liver peroxisomes and enzyme activity profiles support the idea that NAD(+) becomes regenerated during fatty acid ?-oxidation by alternative shuttling processes in human peroxisomes involving lactate dehydrogenase and/or malate dehydrogenase. Taken together, our data represent a valuable resource for future studies of peroxisome biochemistry that will advance research of human peroxisomes in health and disease.
Detection of yet unknown subgroups showing differential gene or protein expression is a frequent goal in the analysis of modern molecular data. Applications range from cancer biology over developmental biology to toxicology. Often a control and an experimental group are compared, and subgroups can be characterized by differential expression for only a subgroup-specific set of genes or proteins. Finding such genes and corresponding patient subgroups can help in understanding pathological pathways, diagnosis and defining drug targets. The size of the subgroup and the type of differential expression determine the optimal strategy for subgroup identification. To date, commonly used software packages hardly provide statistical tests and methods for the detection of such subgroups. Different univariate methods for subgroup detection are characterized and compared, both on simulated and on real data. We present an advanced design for simulation studies: Data is simulated under different distributional assumptions for the expression of the subgroup, and performance results are compared against theoretical upper bounds. For each distribution, different degrees of deviation from the majority of observations are considered for the subgroup. We evaluate classical approaches as well as various new suggestions in the context of omics data, including outlier sum, PADGE, and kurtosis. We also propose the new FisherSum score. ROC curve analysis and AUC values are used to quantify the ability of the methods to distinguish between genes or proteins with and without certain subgroup patterns. In general, FisherSum for small subgroups and [Formula: see text]-test for large subgroups achieve best results. We apply each method to a case-control study on Parkinsons disease and underline the biological benefit of the new method.
Olfactory impairment is increasingly recognized as an early symptom in the development of Parkinsons disease. Testing olfactory function is a non-invasive method but can be time-consuming which restricts its application in clinical settings and epidemiological studies. Here, we investigate odor identification as a supportive diagnostic tool for Parkinsons disease and estimate the performance of odor subsets to allow a more rapid testing of olfactory impairment.
Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain, used for the semantic annotation of data, and ontologies are used in structured data formats and databases to avoid inconsistencies in annotation, to have a unique (and preferably short) accession number and to give researchers and computer algorithms the possibility for more expressive semantic annotation of data. The Human Proteome Organization (HUPO)-Proteomics Standards Initiative (PSI) makes extensive use of ontologies/CVs in their data formats. The PSI-Mass Spectrometry (MS) CV contains all the terms used in the PSI MS-related data standards. The CV contains a logical hierarchical structure to ensure ease of maintenance and the development of software that makes use of complex semantics. The CV contains terms required for a complete description of an MS analysis pipeline used in proteomics, including sample labeling, digestion enzymes, instrumentation parts and parameters, software used for identification and quantification of peptides/proteins and the parameters and scores used to determine their significance. Owing to the range of topics covered by the CV, collaborative development across several PSI working groups, including proteomics research groups, instrument manufacturers and software vendors, was necessary. In this article, we describe the overall structure of the CV, the process by which it has been developed and is maintained and the dependencies on other ontologies. Database URL: http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo.
New developments in proteomics enable scientists to examine hundreds to thousands of proteins in parallel. Quantitative proteomics allows the comparison of different proteomes of cells, tissues, or body fluids with each other. Analyzing and especially organizing these data sets is often a Herculean task. Pathway Analysis software tools aim to take over this task based on present knowledge. Companies promise that their algorithms help to understand the significance of scientists data, but the benefit remains questionable, and a fundamental systematic evaluation of the potential of such tools has not been performed until now. Here, we tested the commercial Ingenuity Pathway Analysis tool as well as the freely available software STRING using a well-defined study design in regard to the applicability and value of their results for proteome studies. It was our goal to cover a wide range of scientific issues by simulating different established pathways including mitochondrial apoptosis, tau phosphorylation, and Insulin-, App-, and Wnt-signaling. Next to a general assessment and comparison of the pathway analysis tools, we provide recommendations for users as well as for software developers to improve the added value of a pathway study implementation in proteomic pipelines.
The Annual Spring workshop of the HUPO-PSI was this year held at the EMBL International Centre for Advanced Training (EICAT) in Heidelberg, Germany. Delegates briefly reviewed the successes of the group to date. These include the wide spread implementation of the molecular interaction data exchange formats, PSI-MI XML2.5 and MITAB, and also of mzML, the standard output format for mass spectrometer output data. These successes have resulted in enhanced accessibility to published data, for example the development of the PSICQUIC common query interface for interaction data and the development of databases such as PRIDE to act as public repositories for proteomics data and increased biosharing, through the development of consortia, for example IMEx and ProteomeXchange which will both share the burden of curating the increasing amounts of data being published and work together to make this more accessible to the bench scientist. Work then started over the three days of the workshop, with a focus on advancing the draft format for handling quantitative mass spectrometry data (mzQuantML) and further developing TraML, a standardized format for the exchange and transmission of transition lists for SRM experiments.
Several projects were initiated by the Human Proteome Organisation (HUPO) focusing on the proteome analysis of distinct human organs. The initiative dedicated to the brain, its development and correlated diseases is the HUPO Brain Proteome Project (HUPO BPP). An objective data submission, storage, and reprocessing strategy have been established with the help of the results gained in a pilot study phase and within subsequent studies. The bioinformatic relevance of the data is drawn from the inter-laboratory comparisons as well as from the recalculation of all data sets submitted by the different groups. In the following, results of the single groups as well as the centralised reprocessing effort are summarised, demonstrating the added-value of this concerted work.
To deal with the data flood of current mass spectrometry methods, standard data formats are needed. The Proteomics Standards Initiative (PSI) of the Human Proteome Organisation (HUPO) develops open storage and transfer standards for and with the community. The Proteomics Informatics work group of the PSI has recently released an XML-based format to store the parameters and results of spectrum identification algorithms (the so-called search engines), which identify peptides and/or proteins from mass spectra. Here, this format called "mzIdentML" is described by giving principle design concepts and presenting examples of important use cases.
In recent years, standardization and quality control have become important key points in industry, e.g. in drug discovery and for developing medical products. Is quality control in academic Proteomics a minor problem nowadays, where standard data formats and public repositories for data sharing exist? In this article, it is discussed how standard formats and repositories already support the documentation of quality control criteria in protein identification and quantification, and what has to be improved in the future: It is stated that the Proteomics community (represented by a group like the Proteomics Standards Initiative) will have to define a minimum document regarding quality control and to extend existing standards with additional quality control criteria enabling a substantial and standardized quality control process.
The plenary session of the Proteomics Standards Initiative of the Human Proteome Organisation at the 8th Annual HUPO World Congress updated the delegates on the current status of the ongoing work of this group. The mass spectrometry group reviewed the progress of mzML since its release last year and detailed new work on providing a common format for SRM/MRM transition lists (TraML). The implementation of mzIdentML, for describing the output of proteomics search engines, was outlined and the release of a new web service PSICQUIC, which allows users to simultaneously search multiple interaction databases, was announced. Finally, the audience participated in a lively debate, discussing both the benefits of these standard formats and issues with their adoption and use in a research environment.
The HUPO Brain Proteome Project (HUPO BPP) held its 12th workshop in Toronto on 26 September 2009 prior to the HUPO VIII World Congress. The principal aim of this project is to obtain a better understanding of neurodiseases and ageing, with the ultimate objective of discovering prognostic and diagnostic biomarkers, in addition to the development of novel diagnostic techniques and new medications. The attendees came together to discuss progress in the human clinical neuroproteomics and to define the needs and guidelines required for more advanced proteomic approaches.
Today, label-free mass spectrometry methods are frequently used for quantification of proteins and peptides. There have been several proposals of measurable parameters that best reflect quantities, such as peak areas as well as spectral counts. This review provides a systematic overview of the proposed methods. Owing to the shotgun proteomics approach generally used today for label-free mass spectrometry, any quantitative measure in the first place is a measure of peptide quantity. There has been no systematic research on how to best infer protein quantity from its measured peptides quantities. The way peptide identifications are assembled to protein lists may especially lead to significantly different results in protein quantification. A further focus of this review will thus be the assembly of measured peptide quantities to a protein quantity.
The organization and storage of proteomics data are challenging issues today and even more for the rising amount of information in the future. This review article describes the advantages of using Laboratory Information Management Systems (LIMS) in proteomics laboratories. Seven typical LIMS are explored in detail to describe their role in an even bigger interrelation. They are a central part of the proteomics data workflow, starting with data generation and ending with the publication in journals and repositories. Therefore, they enable community-wide data utilization and further Systems Biology discoveries.
ProDaC (Proteomics Data Collection), a "Coordination Action" within the 6(th) EU framework programme, was created to support the collection, distribution and public availability of data from proteomics experiments. Within the consortium standards are created and maintained enabling an extensive data collection within the proteomics community. Important elements of ProDaC are workshops held twice a year to allow communication between the ProDaC partners and to report the ongoing progress. The most recent assembly was the 4(th) ProDaC workshop on August 15(th), 2008, in Amsterdam, The Netherlands. It took place directly before the 7(th) HUPO Annual World Congress (Human Proteome Organisation). Work package coordinators and partners presented the progress achieved since the last meeting. Additionally, an EU official presented funding opportunities for proteomics in the next EU framework programme and five external speakers presented talks about their work in relation to ProDaC.
The Proteomics Data Collection (ProDaC) consortium, a "Coordination Action" funded by the 6th EU Framework Programme, started in October 2006. Its aim was to facilitate the collection and distribution of proteomics data and the public availability of data sets from proteomics experiments. Within the consortium standard formats are created and tools are developed to allow extensive data collection within the proteomics community. An important part of ProDaC is the organization of workshops twice a year to inform about the consortiums progress and to stimulate communication between the ProDaC partners and between partners and interested members of the proteomics community. ProDaC ends on March 31, 2009. The most recent (and final) workshop was the 5th ProDaC workshop held on March 4, 2009 in Kolympari, Crete, Greece. The progress since the last meeting and an overall summary was presented by the work package coordinators and partners. Four external speakers presented talks about their work in relation to ProDaC.
In proteomics, rapid developments in instrumentation led to the acquisition of increasingly large data sets. Correspondingly, ProDaC was founded in 2006 as a Coordination Action project within the 6th European Union Framework Programme to support data sharing and community-wide data collection. The objectives of ProDaC were the development of documentation and storage standards, setup of a standardized data submission pipeline and collection of data. Ending in March 2009, ProDaC has delivered a comprehensive toolbox of standards and computer programs to achieve these goals.
Alteration in the expression level of peripheral myelin protein 22 (PMP22) is the most frequent cause for demyelinating neuropathies of Charcot-Marie-Tooth type. Here, we demonstrate a loss of motoneurons (MNs) in the spinal cords from transgenic mice over-expressing Pmp22 (Pmp22(tg)) while mice lacking Pmp22 [Pmp22(ko); knockout (ko)] exhibited normal MN numbers at the symptomatic age of 60 days. In order to describe the molecular changes in affected MNs, these cells were isolated from lumbar spinal cords by laser-capture microdissection. Remarkably, the MNs of the Pmp22(ko) and Pmp22(tg) mice showed different expression profiles because of the altered Pmp22 expression. The changes in the expression profile of MNs from Pmp22(ko) mice resemble those described in MNs from mice after nerve injury and included genes that had been described in neuronal growth and regeneration like Gap43 and Sprr11a. The changes detected in the expression pattern of MNs from Pmp22(tg) mice exhibited fewer similarities to other expression patterns. The specific expression pattern in the MNs of the Pmp22(ko) mice might contribute to the better survival of the MNs. Our study also revealed induction of genes like brain-expressed X-linked 1 (Bex1) and desmoplakin (Dsp) that had recently been found up-regulated in MNs of human amyotrophic lateral sclerosis patients.
The HUPO Brain Proteome Project (HUPO BPP) held its 11th workshop in Kolymbari on March 3, 2009. The principal aim of this project is to obtain a better understanding of neurodiseases and ageing, with the ultimate objective of discovering prognostic and diagnostic biomarkers, in addition to the development of novel diagnostic techniques and new medications. The attendees came together to discuss sub-project progress in the clinical neuroproteomics of human or mouse models of Alzheimers and Parkinsons disease, and to define the needs and guidelines required for more advanced proteomics approaches. With the election of new steering committees, the members of the HUPO BPP elaborated an actual plan promoting activities, outcomes, and future directions of the HUPO BPP to acquire new funding and new participants.
The Functional Genomics Experiment data model (FuGE) has been developed to increase the consistency and efficiency of experimental data modeling in the life sciences, and it has been adopted by a number of high-profile standardization organizations. FuGE can be used: (1) directly, whereby generic modeling constructs are used to represent concepts from specific experimental activities; or (2) as a framework within which method-specific models can be developed. FuGE is both rich and flexible, providing a considerable number of modeling constructs, which can be used in a range of different ways. However, such richness and flexibility also mean that modelers and application developers have choices to make when applying FuGE in a given context. This paper captures emerging best practice in the use of FuGE in the light of the experience of several groups by: (1) proposing guidelines for the use and extension of the FuGE data model; (2) presenting design patterns that reflect recurring requirements in experimental data modeling; and (3) describing a community software tool kit (STK) that supports application development using FuGE. We anticipate that these guidelines will encourage consistent usage of FuGE, and as such, will contribute to the development of convergent data standards in omics research.
The plenary session of the Proteomics Standards Initiative (PSI) of the Human Proteome Organisation at the 7(th) annual HUPO world congress updated the delegates on the current status of the ongoing work of this group. The release of the new MS interchange format, mzML, was formally announced and delegates were also updated on the advances in the area of molecular interactions, protein separations, proteomics informatics and also on PEFF, a common sequence database format currently under review in the PSI documentation process. Community input on this initiative was requested. Finally, the impact these new data standards are having on the data submission process, which increasingly is an integral part of the publication process, was reviewed and discussed.
Accurate quantification of proteins is one of the major tasks in current proteomics research. To address this issue, a wide range of stable isotope labeling techniques have been developed, allowing one to quantitatively study thousands of proteins by means of mass spectrometry. In this article, the FindPairs module of the PeakQuant software suite is detailed. It facilitates the automatic determination of protein abundance ratios based on the automated analysis of stable isotope-coded mass spectrometric data. Furthermore, it implements statistical methods to determine outliers due to biological as well as technical variance of proteome data obtained in replicate experiments. This provides an important means to evaluate the significance in obtained protein expression data. For demonstrating the high applicability of FindPairs, we focused on the quantitative analysis of proteome data acquired in (14)N/(15)N labeling experiments. We further provide a comprehensive overview of the features of the FindPairs software, and compare these with existing quantification packages. The software presented here supports a wide range of proteomics applications, allowing one to quantitatively assess data derived from different stable isotope labeling approaches, such as (14)N/(15)N labeling, SILAC, and iTRAQ. The software is publicly available at http://www.medizinisches-proteom-center.de/software and free for academic use.
The development of the nematode Caenorhabditis elegans is a highly dynamic process. Although various studies have assessed global transcriptome changes, information on the dynamics of the proteome during ontogenesis is not available. We metabolically labeled C. elegans by using ¹?N ammonium chloride as a precursor in Escherichia coli feeding bacteria grown in minimal media as a new cost-effective technique. Quantitative proteome analysis was performed by LC-MS/MS in animals harvested at different times during ontogenesis. We identified and quantified 245 proteins at all larval stages in two independent replicates. Between larval stages (20 and 40 h after hatching) 61 were found to change significantly in level. Among those ribosomal proteins, aminoacyl tRNA synthetases and enzymes of energy metabolism increased in abundance, while extracellular matrix proteins and muscle proteins dominated groups displaying reduced levels. Moreover, changes observed for selected proteins such as VIT-6 and SOD-1 matched with previously published findings confirming the validity of our approach. The metabolic labeling technique applied seems well suited to assess changes in the proteome changes of C. elegans in a quantitative manner during larval development. The data set generated provides the basis for further exploitation of the role of individual proteins or protein clusters during ontogenesis.
In recent years, the generation and interpretation of MS/MS spectra for the identification of peptides and proteins has matured to a frequently used automatic workflow in Proteomics. Several software solutions for the automated analysis of MS/MS spectra allow for high-throughput/high-performance analyses of complex samples. Related to MS/MS searches, target-decoy approaches have gained more and more popularity: in a "decoy" part of the search database nonexistent sequences mimic real sequences (the "target" sequences). With their help, the number of falsely identified peptides/proteins can be estimated after a search and the resulting protein list can be cut at a specified false discovery rate (FDR). This is an essential prerequisite for all quantitative approaches, as they rely on correct identifications. Especially the label-free approach "spectral counting"-gaining more and more popularity due to low costs and simplicity-depends directly on the correctness of peptide-spectrum matches (PSMs). This works aim is to describe five popular search engines-especially their general properties regarding protein identification, but also their quantification abilities, if those go beyond spectral counting. By doing so, Proteomics researchers are enabled to compare their features and to choose an appropriate solution for their specific question. Furthermore, the search engines are applied to a spectrum data set generated from a complex sample with a Thermo LTQ Velos OrbiTrap (Thermo Fisher Scientific, Waltham, MA, USA). The results of the search engines are compared, e.g., regarding time requirements, peptides and proteins found, and the search engines behavior using the decoy approach.
Mass spectrometry is frequently used in quantitative proteomics to detect differentially regulated proteins. A very important but unfortunately oftentimes neglected part in detecting differential proteins is the statistical analysis. Data from proteomics experiments are usually high-dimensional and hence require profound statistical methods. It is especially important to already correctly design a proteomic experiment before it is conducted in the laboratory. Only this can ensure that the statistical analysis is capable of detecting truly differential proteins afterwards. This chapter thus covers aspects of both statistical planning and the actual analysis of quantitative proteomic experiments.
We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative. The format was developed by the Proteomics Standards Initiative in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.
The plenary session of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization at the Tenth annual HUPO World Congress updated the delegates on the ongoing activities of this group. The Molecular Interactions workgroup described the success of the PSICQUIC web service, which enables users to access multiple interaction resources with a single query. One user instance is the IMEx Consortium, which uses the service to enable users to access a non-redundant set of protein-protein interaction records. The mass spectrometry data formats, mzML for mass spectrometer output files and mzIdentML for the output of search engines, are now successfully established with increasing numbers of implementations. A format for the output of quantitative proteomics data, mzQuantML, and also TraML, for SRM/MRM transition lists, are both currently nearing completion. The corresponding MIAPE documents are being updated in line with advances in the field, as is the shared controlled vocabulary PSI-MS. In addition, the mzTab format was introduced, as a simpler way to report MS proteomics and metabolomics results. Finally, the ProteomeXchange Consortium, which will supply a single entry point for the submission of MS proteomics data to multiple data resources including PRIDE and PeptideAtlas, is currently being established.
Over the past years, phosphoproteomics has advanced to a prime tool in signaling research. Since then, an enormous amount of information about in vivo protein phosphorylation events has been collected providing a treasure trove for gaining a better understanding of the molecular processes involved in cell signaling. Yet, we still face the problem of how to achieve correct modification site localization. Here we use alternative fragmentation and different bioinformatics approaches for the identification and confident localization of phosphorylation sites. Phosphopeptide-enriched fractions were analyzed by multistage activation, collision-induced dissociation and electron transfer dissociation (ETD), yielding complementary phosphopeptide identifications. We further found that MASCOT, OMSSA and Andromeda each identified a distinct set of phosphopeptides allowing to increase the number of site assignments. The post-search engine SLoMo provided confident phosphorylation site localization, whereas different versions of PTM-Score integrated in MaxQuant differed in performance. Based on high-resolution ETD and higher collisional dissociation (HCD) datasets from a large synthetic peptide and phosphopeptide reference library reported by Marx et al. [Nat. Biotechnol. 2013, 31 (6), 557-564], we show that an Andromeda/PTM-Score probability of 1 is required to provide an false localization rate (FLR) of 1% for HCD data, while 0.55 is sufficient for high-resolution ETD spectra. Additional analyses of HCD data demonstrated that for phosphotyrosine peptides and phosphopeptides containing two potential phosphorylation sites, PTM-Score probability cut-offs values of < 1 can be applied to ensure an FLR of 1%. Proper adjustment of localization probability cut-offs allowed us to significantly increase the number of confident sites with an FLR of < 1%. Our findings underscore the need for the systematic assessment of FLRs for different score values to report confident modification site localization.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.