Precise protein quantification is essential in comparative proteomics. Currently, quantification bias is inevitable when using proteotypic peptide-based quantitative proteomics strategy for the differences in peptides measurability. To improve quantification accuracy, we proposed an "empirical rule for linearly correlated peptide selection (ERLPS)" in quantitative proteomics in our previous work. However, a systematic evaluation on general application of ERLPS in quantitative proteomics under diverse experimental conditions needs to be conducted. In this study, the practice workflow of ERLPS was explicitly illustrated; different experimental variables, such as, different MS systems, sample complexities, sample preparations, elution gradients, matrix effects, loading amounts, and other factors were comprehensively investigated to evaluate the applicability, reproducibility, and transferability of ERPLS. The results demonstrated that ERLPS was highly reproducible and transferable within appropriate loading amounts and linearly correlated response peptides should be selected for each specific experiment. ERLPS was used to proteome samples from yeast to mouse and human, and in quantitative methods from label-free to O18/O16-labeled and SILAC analysis, and enabled accurate measurements for all proteotypic peptide-based quantitative proteomics over a large dynamic range.
Identifying new drug target (DT) proteins is important in pharmaceutical and biomedical research. General machine learning method (GMLM) classifiers perform fairly well at prediction if the training dataset is well prepared. However, a common problem in preparing the training dataset is the lack of a negative dataset. To address this problem, we proposed two methods that can help GMLM better select the negative training dataset from the test dataset. The prediction accuracy was improved with the training dataset from the proposed strategies. The classifier identified 1797 and 227 potential DT proteins, some of which were mentioned in previous research, which added correlative weight to the new method. Practically, these two sets of potential DT proteins or their homologues are worth considering.
With the advance of experimental technologies, different stable isotope labeling methods have been widely applied to quantitative proteomics. Here, we present an efficient tool named SILVER for processing the stable isotope labeling mass spectrometry data. SILVER implements novel methods for quality control of quantification at spectrum, peptide and protein levels, respectively. Several new quantification confidence filters and indices are used to improve the accuracy of quantification results. The performance of SILVER was verified and compared with MaxQuant and Proteome Discoverer using a large-scale dataset and two standard datasets. The results suggest that SILVER shows high accuracy and robustness while consuming much less processing time. Additionally, SILVER provides user-friendly interfaces for parameter setting, result visualization, manual validation and some useful statistics analyses.Availability and implementation: SILVER and its source codes are freely available under the GNU General Public License v3.0 at http://bioinfo.hupo.org.cn/silver.
The discovery of regulation relationship of protein interactions is crucial for the mechanism research in signaling network. Bioinformatics methods can be used to accelerate the discovery of regulation relationship between protein interactions, to distinguish the activation relations from inhibition relations. In this paper, we describe a novel method to predict the regulation relations of protein interactions in the signaling network. We detected 4,417 domain pairs that were significantly enriched in the activation or inhibition dataset. Three machine learning methods, logistic regression, support vector machines(SVMs), and naïve bayes, were explored in the classifier models. The prediction power of three different models was evaluated by 5-fold cross-validation and the independent test dataset. The area under the receiver operating characteristic curve for logistic regression, SVM, and naïve bayes models was 0.946, 0.905 and 0.809, respectively. Finally, the logistic regression classifier was applied to the human proteome-wide interaction dataset, and 2,591 interactions were predicted with their regulation relations, with 2,048 in activation and 543 in inhibition. This model based on domains can be used to identify the regulation relations between protein interactions and furthermore reconstruct signaling pathways.
The ubiquitin-like protein FAT10 (HLA-F adjacent transcript 10) is uniquely expressed in mammals. The fat10 gene is encoded in the MHC class I locus in the human genome and is related to some specific processes, such as apoptosis, immune response, and cancer. However, biological knowledge of FAT10 is limited, owing to the lack of identification of its conjugates. FAT10 covalently modifies proteins in eukaryotes, but only a few substrates of FAT10 have been reported until now, and no FATylated sites have been identified. Here, we report the proteome-scale identification of FATylated proteins by liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS). We identified 175 proteins with high confidence as FATylated candidates. A total of 13 modified sites were identified for the first time by a modified search of the raw MS data. The modified sites were highly enriched with hydrophilic amino acids. Furthermore, the FATylation processes of hnRNP C2, PCNA, and PDIA3 were verified by a coimmunoprecipitation assay. We confirmed that most of the substrates were covalently attached to a FAT10 monomer. The functional distribution of the FAT10 targets suggests that FAT10 participates in various biological processes, such as translation, protein folding, RNA processing, and macromolecular complex assembly. These results should be very useful for investigating the biological functions of FAT10.
The directionality of protein interactions is the prerequisite of forming various signaling networks, and the construction of signaling networks is a critical issue in the discovering the mechanism of the life process. In this paper, we proposed a novel method to infer the directionality in protein-protein interaction networks and furthermore construct signaling networks. Based on the functional annotations of proteins, we proposed a novel parameter GODS and established the prediction model. This method shows high sensitivity and specificity to predict the directionality of protein interactions, evaluated by fivefold cross validation. By taking the threshold value of GODS as 2, we achieved accuracy 95.56 percent and coverage 74.69 percent in the human test set. Also, this method was successfully applied to reconstruct the classical signaling pathways in human. This study not only provided an effective method to unravel the unknown signaling pathways, but also the deeper understanding for the signaling networks, from the aspect of protein function.
Advances in proteomics research involve the use of high-precision and high-resolution mass spectrometry instruments. Although hardware improvements are the main impetus for the acquisition of high-quality data, enhancements in software tools are also needed. In this study, recalibration was verified as an important way to improve data accuracy. A new version tool, known as FTDR 2.0, was developed to recalibrate the mass-to-charge ratio error of most observed parent ions to the sub part per million level in routine experiments. First, many new parameters were introduced and screened as features online to reduce systematic error and to adapt to various data sets. Second, a support vector regression model was trained to characterize the complex nonlinear maps from features to mass-to-charge ratio measurement errors. Third, a specific mass-to-charge ratio error tolerance for each parent ion was estimated by considering the impact of signal intensity. FTDR 2.0 is a user-friendly tool that supports most commonly used data standards and formats. A C++ library and the source code are provided to support the redevelopment and integration into other mass spectrometry data processing tools. The performance of FTDR 2.0 was verified using several experimental data sets from different research programs. Recalibration with FTDR 2.0 has been proved to improve the peptide identification in qualitative, quantitative, and post-translational modification analyses.
While the concept of Quality-by-Design is addressed at the upstream and downstream process development stages, we questioned whether there are advantages to addressing the issues of biologics quality early in the design of the molecule based on fundamental biophysical characterization, and thereby reduce complexities in the product development stages. Although limited number of bispecific therapeutics are in clinic, these developments have been plagued with difficulty in producing materials of sufficient quality and quantity for both preclinical and clinical studies. The engineered heterodimeric Fc is an industry-wide favorite scaffold for the design of bispecific protein therapeutics because of its structural, and potentially pharmacokinetic, similarity to the natural antibody. Development of molecules based on this concept, however, is challenged by the presence of potential homodimer contamination and stability loss relative to the natural Fc. We engineered a heterodimeric Fc with high heterodimeric specificity that also retains natural Fc-like biophysical properties, and demonstrate here that use of engineered Fc domains that mirror the natural system translates into an efficient and robust upstream stable cell line selection process as a first step toward a more developable therapeutic.
Bispecific IgG asymmetric (heterodimeric) antibodies offer enhanced therapeutic efficacy, but present unique challenges for drug development. These challenges are related to the proper assembly of heavy and light chains. Impurities such as symmetric (homodimeric) antibodies can arise with improper assembly. A new method to assess heterodimer purity of such bispecific antibody products is needed because traditional separation-based purity assays are unable to separate or quantify homodimer impurities. This paper presents a liquid chromatography-mass spectrometry (LC-MS)-based method for evaluating heterodimeric purity of a prototype asymmetric antibody containing two different heavy chains and two identical light chains. The heterodimer and independently expressed homodimeric standards were characterized by two complementary LC-MS techniques: Intact protein mass measurement of deglycosylated antibody and peptide map analyses. Intact protein mass analysis was used to check molecular integrity and composition. LC-MS(E) peptide mapping of Lys-C digests was used to verify protein sequences and characterize post-translational modifications, including C-terminal truncation species. Guided by the characterization results, a heterodimer purity assay was demonstrated by intact protein mass analysis of pure deglycosylated heterodimer spiked with each deglycosylated homodimeric standard. The assay was capable of detecting low levels (2%) of spiked homodimers in conjunction with co-eluting half antibodies and multiple mass species present in the homodimer standards and providing relative purity differences between samples. Detection of minor homodimer and half-antibody C-terminal truncation species at levels as low as 0.6% demonstrates the sensitivity of the method. This method is suitable for purity assessment of heterodimer samples during process and purification development of bispecific antibodies, e.g., clone selection.
The discovery of novel cancer genes is one of the main goals in cancer research. Bioinformatics methods can be used to accelerate cancer gene discovery, which may help in the understanding of cancer and the development of drug targets. In this paper, we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence, including protein-protein interaction network properties, and sequence and functional features. We detected 55 features that were significantly different between cancer genes and non-cancer genes. Fourteen cancer-associated features were chosen to train the classifier. Four machine learning methods, logistic regression, support vector machines (SVMs), BayesNet and decision tree, were explored in the classifier models to distinguish cancer genes from non-cancer genes. The prediction power of the different models was evaluated by 5-fold cross-validation. The area under the receiver operating characteristic curve for logistic regression, SVM, Baysnet and J48 tree models was 0.834, 0.740, 0.800 and 0.782, respectively. Finally, the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database, and 1976 cancer gene candidates were identified. We found that the integrated prediction model performed much better than the models based on the individual biological evidence, and the network and functional features had stronger powers than the sequence features in predicting cancer genes.
Different from conventional unipolar-type 1D-1R RRAM devices, a bipolar-type 1D-1R memory device concept is proposed and successfully demonstrated by the integration of Ni/TiOx/Ti diode and Pt/HfO2/Cu bipolar RRAM cell to suppress the undesired sneak current in a cross-point array. The bipolar 1D-1R memory device not only achieves self-compliance resistive switching characteristics by the reverse bias current of the Ni/TiOx/Ti diode, but also exhibits excellent bipolar resistive switching characteristics such as uniform switching, satisfactory data retention, and excellent scalability, which give it high potentiality for high-density integrated nonvolatile memory applications.
The identification of ubiquitin (Ub) and Ub-like protein (Ubl) conjugation sites is important in understanding their roles in biological pathway regulations. However, unambiguously and sensitively identifying Ub/Ubl conjugation sites through high-throughput MS remains challenging. We introduce an improved workflow for identifying Ub/Ubl conjugation sites based on the ChopNSpice and X!Tandem software. ChopNSpice is modified to generate Ub/Ubl conjugation peptides in the form of a cross-link. A combinatorial FASTA database can be acquired using the modified ChopNSpice (MchopNSpice). The modified X!Tandem (UblSearch) introduces a new fragmentation model for the Ub/Ubl conjugation peptides to match unambiguously the MS/MS spectra with linear peptides or Ub/Ubl conjugation peptides using the combinatorial FASTA database. The novel workflow exhibited better performance in analyzing an Ub and Ubl spectral library and a large-scale Trypanosoma cruzi small Ub-related modifier dataset compared with the original ChopNSpice method. The proposed workflow is more suitable for processing large-scale MS datasets of Ub/Ubl modification. MchopNSpice and UblSearch are freely available under the GNU General Public License v3.0 at http://sourceforge.net/projects/maublsearch.
The directionality of protein interactions is the prerequisite of forming various signaling networks and the construction of signaling networks is a critical issue in the discovering the mechanism of the life process. In this paper, we proposed a novel method to infer the directionality in protein-protein interaction networks and furthermore construct signaling networks. Based on the functional annotations of proteins, we proposed a novel parameter GODS and established the prediction model. This method shows high sensitivity and specificity to predict the directionality of protein interactions, evaluated by 5-fold cross-validation. By taking the threshold value of GODS as 2, we achieved accuracy 95.56% and coverage 74.69% in the human test set. Also, this method was successfully applied to reconstruct the classical signaling pathways in human. This study not only provided an effective method to unravel the unknown signaling pathways, but the deeper understanding for the signaling networks, from the aspect of protein function.
To sensitively analyze complex protein mixtures by mass spectrometry-based shotgun proteomics, researchers have employed platforms that couple orthogonal peptide fractionation methods using nanoscale HPLC. Commonly used platforms have coupled either strong cation exchange (SCX) HPLC or preparative isoelectric focusing (IEF) with nanoscale reversed-phase (nanoRP) HPLC fractionation of peptides. Coupling two dimensions of peptide fractionation, prior to mass spectrometric analysis, increases the sensitivity for identifying low abundance proteins. However, the large dynamic range of protein abundance and high level of complexity of protein mixtures derived from many biological sources, such as bodily fluids, require additional steps of peptide fractionation. To address this shortcoming, we have developed a platform combining three dimensions of peptide fractionation as follows: (1) preparative IEF; (2) SCX HPLC; and (3) nanoRP HPLC. This platform significantly increases the sensitivity of shotgun proteomic analysis in complex protein mixtures. Here, we describe the implementation of this three-dimensional peptide fractionation platform for proteomic studies of complex mixtures.
The stabilization of the resistive switching characteristics is important to resistive random access memory (RRAM) device development. In this paper, an alternative approach for improving resistive switching characteristics in ZrO(2)-based resistive memory devices has been investigated. Compared with the Cu/ZrO(2)/Pt structure device, by embedding a thin TiO(x) layer between the ZrO(2) and the Cu top electrode, the Cu/TiO(x)-ZrO(2)/Pt structure device exhibits much better resistive switching characteristics. The improvement of the resistive switching characteristics in the Cu/TiO(x)-ZrO(2)/Pt structure device might be attributed to the modulation of the barrier height at the electrode/oxide interfaces.
A new hydrophilic interaction chromatography (HILIC) column packed with amide 1.7 ?m sorbent was applied to the characterization of glycoprotein digests. Due to the impact of the hydrophilic carbohydrate moiety, glycopeptides were more strongly retained on the column and separated from the remaining nonglycosylated peptides present in the digest. The glycoforms of the same parent peptide were also chromatographically resolved and analyzed using ultraviolet and mass spectrometry detectors. The HILIC method was applied to glyco-profiling of a therapeutic monoclonal antibody and proteins with several N-linked and O-linked glycosylation sites. For characterization of complex proteins with multiple glycosylation sites we utilized 2D LC, where RP separation dimension was used for isolation of glycopeptides and HILIC for resolution of peptide glycoforms. The analysis of site-specific glycan microheterogeneity was illustrated for the CD44 fusion protein.
Influenza vaccination is recognized as the most effective method for reducing morbidity and mortality due to seasonal influenza. To improve vaccine supply and to increase flexibility in vaccine manufacturing, cell culture-based vaccine production has emerged to overcome limitations of egg-based production. The switch of production system and the need for annual re-evaluation of vaccines for the effectiveness due to frequent viral antigenic changes call for methods for complete characterization of the hemagglutinin (HA) antigens and the final vaccine products. This study describes advanced liquid chromatography-mass spectrometry (LC-MS) methods for simultaneous identification of HA proteins and process-related impurities in a trivalent influenza candidate vaccine, comprised of purified recombinant HA (rHA) antigens produced in an insect cell-baculovirus expression vector system (BEVS). N-linked glycosylation sites and glycoforms of the three rHA proteins (corresponding to influenza A subtypes H1N1 and H3N2 and B virus, respectively) were profiled by peptide mapping using reversed-phase (RP) LC-MS(E) (data independent acquisition LC-MS using an alternating low and elevated collision energy scan mode). The detected site-specific glycoforms were further confirmed and quantified by hydrophilic interaction LC (HILIC)-multiple reaction monitoring (MRM) assays. LC-MS(E) was used to characterize the vaccine candidate, providing both protein identities and site-specific information of glycosylation and degradations on each rHA protein. HILIC-MRM methodology was used for rapid confirming and quantifying site-specific glycoforms and potential degradations on each rHA protein. These methods can contribute to the monitoring of vaccine quality especially as it pertains to product comparability studies to evaluate the impact of production process changes.
Although the characterization of genes associated with cytoplasmic male sterility (CMS) and fertility restoration (Rf) has been well documented, the evolutionary relationship between nuclear Rf and CMS factors in mitochondria in Oryza species is still less understood. Here, 41 accessions from 7 Oryza species with AA genome were employed for analyzing the evolutionary relationships between the CMS factors and Rf candidates on chromosome 10. The phylogenetic tree based on restriction fragment length polymorphism patterns of CMS-associated mitochondrial genes showed that these 41 Oryza accessions fell into 3 distinct groups. Another phylogenetic tree based on PCR profiles of the nuclear Rf candidates on chromosome 10 was also established, and three groups were distinctively grouped. The accessions in each subgroup/group of the two phylogenetic trees are well parallel to each other. Furthermore, the 41 investigated accessions were test-crossed with Honglian (gametophytic type) and Wild-abortive (sporophytic type) CMS, and 5 groups were classified according to their restoring ability. The accessions in the same subgroup of the two phylogenetic trees shared similar fertility restoring pattern. Therefore, we conclude that the CMS-associated mitotypes are compatible to the Rf candidate-related nucleotypes, CMS and Rf have a parallel evolutionary relation in the Oryza species.
This study shows that state-of-the-art liquid chromatography (LC) and mass spectrometry (MS) can be used for rapid verification of identity and characterization of sequence variants and posttranslational modifications (PTMs) for antibody products. A candidate biosimilar IgG1 monoclonal antibody (mAb) was compared in detail to a commercially available innovator product. Intact protein mass, primary sequence, PTMs, and the micro-differences between the two mAbs were identified and quantified simultaneously. Although very similar in terms of sequences and modifications, a mass difference observed by LC-MS intact mass measurements indicated that they were not identical. Peptide mapping, performed with data independent acquisition LC-MS using an alternating low and elevated collision energy scan mode (LC-MS(E)), located the mass difference between the biosimilar and the innovator to a two amino acid residue variance in the heavy chain sequences. The peptide mapping technique was also used to comprehensively catalogue and compare the differences in PTMs of the biosimilar and innovator mAbs. Comprehensive glycosylation profiling confirmed that the proportion of individual glycans was different between the biosimilar and the innovator, although the number and identity of glycans were the same. These results demonstrate that the combination of accurate intact mass measurement, released glycan profiling, and LC-MS(E) peptide mapping provides a set of routine tools that can be used to comprehensively compare a candidate biosimilar and an innovator mAb.
Oral cancer survival rates increase significantly when it is detected and treated early. Unfortunately, clinicians now lack tests which easily and reliably distinguish pre-malignant oral lesions from those already transitioned to malignancy. A test for proteins, ones found in non-invasively-collected whole saliva and whose abundances distinguish these lesion types, would meet this critical need.
Motif discovery is an important topic in computational transcriptional regulation studies. In the past decade, many researchers have contributed to the field and many de novo motif-finding tools have been developed, each may have a different strength. However, most of these tools do not have a user-friendly interface and their results are not easily comparable. We present a software called Toolbox of Motif Discovery (Tmod) for Windows operating systems. The current version of Tmod integrates 12 widely used motif discovery programs: MDscan, BioProspector, AlignACE, Gibbs Motif Sampler, MEME, CONSENSUS, MotifRegressor, GLAM, MotifSampler, SeSiMCMC, Weeder and YMF. Tmod provides a unified interface to ease the use of these programs and help users to understand the tuning parameters. It allows plug-in motif-finding programs to run either separately or in a batch mode with predetermined parameters, and provides a summary comprising of outputs from multiple programs. Tmod is developed in C++ with the support of Microsoft Foundation Classes and Cygwin. Tmod can also be easily expanded to include future algorithms.
Cellular nutritional and energy status regulates a wide range of nuclear processes important for cell growth, survival, and metabolic homeostasis. Mammalian target of rapamycin (mTOR) plays a key role in the cellular responses to nutrients. However, the nuclear processes governed by mTOR have not been clearly defined. Using isobaric peptide tagging coupled with linear ion trap mass spectrometry, we performed quantitative proteomics analysis to identify nuclear processes in human cells under control of mTOR. Within 3 h of inhibiting mTOR with rapamycin in HeLa cells, we observed down-regulation of nuclear abundance of many proteins involved in translation and RNA modification. Unexpectedly, mTOR inhibition also down-regulated several proteins functioning in chromosomal integrity and up-regulated those involved in DNA damage responses (DDRs) such as 53BP1. Consistent with these proteomic changes and DDR activation, mTOR inhibition enhanced interaction between 53BP1 and p53 and increased phosphorylation of ataxia telangiectasia mutated (ATM) kinase substrates. ATM substrate phosphorylation was also induced by inhibiting protein synthesis and suppressed by inhibiting proteasomal activity, suggesting that mTOR inhibition reduces steady-state (abundance) levels of proteins that function in cellular pathways of DDR activation. Finally, rapamycin-induced changes led to increased survival after radiation exposure in HeLa cells. These findings reveal a novel functional link between mTOR and DDR pathways in the nucleus potentially operating as a survival mechanism against unfavorable growth conditions.
The proteome of human salivary fluid has the potential to open new doors for disease biomarker discovery. A recent study to comprehensively identify and catalog the human ductal salivary proteome led to the compilation of 1166 proteins. The protein complexity of both saliva and plasma is large, suggesting that a comparison of these two proteomes will provide valuable insight into their physiological significance and an understanding of the unique and overlapping disease diagnostic potential that each fluid provides. To create a more comprehensive catalog of human salivary proteins, we have first compiled an extensive list of proteins from whole saliva (WS) identified through MS experiments. The WS list is thereafter combined with the proteins identified from the ductal parotid, and submandibular and sublingual (parotid/SMSL) salivas. In parallel, a core dataset of the human plasma proteome with 3020 protein identifications was recently released. A total of 1939 nonredundant salivary proteins were compiled from a total of 19 474 unique peptide sequences identified from whole and ductal salivas; 740 out of the total 1939 salivary proteins were identified in both whole and ductal saliva. A total of 597 of the salivary proteins have been observed in plasma. Gene ontology (GO) analysis showed similarities in the distributions of the saliva and plasma proteomes with regard to cellular localization, biological processes, and molecular function, but revealed differences which may be related to the different physiological functions of saliva and plasma. The comprehensive catalog of the salivary proteome and its comparison to the plasma proteome provides insights useful for future study, such as exploration of potential biomarkers for disease diagnostics.
Normalization is a critical step in the analysis of microarray gene expression data. For dual-labeled array, traditional normalization methods assume that the majority of genes are non-differentially expressed and that the number of overexpressed genes approximately equals the number of under-expressed genes. However, these assumptions are inappropriate in some particular conditions. Differentially expressed genes have a negative impact on normalization and are regarded as outliers in statistics. We propose a new outlier removal-based normalization method. Simulated and real data sets were analyzed, and our results demonstrate that our approach can significantly improve the precision of normalization by eliminating the impact of outliers, and efficiently identify candidates for differential expression.
Twelve facultatively anaerobic, endophytic diazotrophs were isolated from surface-sterilized roots of the wild rice species Oryza latifolia and characterized by phenotypic and molecular methods. Six isolates were grouped together as group A by phenotypic characters, and this grouping was confirmed by SDS-PAGE whole-cell protein patterns and insertion sequence-based PCR (IS-PCR) methods. Phylogenetic analysis of the 16S rRNA gene sequence indicated that group A, represented by strain Ola 51(T), is closely related to Enterobacter radicincitans D5/23(T) (98.9 % similarity, except that E. radicincitans D5/23(T) has a 70 bp insertion) and Enterobacter cloacae (98.0 % similarity to the type strain). rpoB gene sequence analysis also showed strain Ola 51(T) has the highest sequence similarity to E. radicincitans DSM 16656(T) (98.3 %), but supported the distinct position. Biological and biochemical tests, protein patterns, genomic DNA fingerprinting, antibiotic resistance and comparison of cellular fatty acids showed differences among group A, E. radicincitans DSM 16656(T) and E. cloacae ATCC 13047(T). DNA-DNA hybridization distinguished strain Ola 51(T) from closely phylogenetically related Enterobacter species. Based on these data, the novel species Enterobacter oryzae sp. nov. is proposed, with strain Ola 51(T) (=LMG 24251(T) =CGMCC 1.7012(T)) as the type strain.
Liquid chromatography (LC)-based peptide mapping is extensively used for establishing protein identity, assessing purity, and detecting post-translational modifications (PTMs) of recombinant proteins in the biopharmaceutical industry. However, current LC-UV/MS peptide mapping methods require multiple analyses and MS/MS experiments to identify protein contaminants and site-specific PTMs. This manuscript evaluated an alternative approach for protein characterization via peptide mapping employing a data independent LC-MS acquisition strategy with an alternate low and elevated collision energy scanning. The acquired peptide precursor and fragment information was utilized for effective identification of peptide sequences and site-specific modifications within a single LC run. The peptide MS signal intensities were reliably measured and used to estimate relative concentrations of PTMs and/or proteins contaminating the target protein. The method was evaluated using tryptic digests of yeast enolase and alcohol dehydrogenase. LC-eluted peptides were successfully sequenced and covered 97% target protein sequences. Protein impurities and site-specific modifications (e.g., M-oxidation and N-deamidation) were identified and quantified.
Signal flow direction is one of the most important features of the protein-protein interactions in signaling networks. However, almost all the outcomes of current high-throughout techniques for protein-protein interactions mapping are usually supposed to be non-directional. Based on the pairwise interaction domains, here we defined a novel parameter protein interaction directional score and then used it to predict the direction of signal flow between proteins in proteome-wide signaling networks. Using 5-fold cross-validation, our approach obtained a satisfied performance with the accuracy 89.79%, coverage 48.08%, and error ratio 16.91%. As an application, we established an integrated human directional protein interaction network, including 2,237 proteins and 5,530 interactions, and inferred a large amount of novel signaling pathways. Directional protein interaction network was strongly supported by the known signaling pathways literature (with the 87.5% accuracy) and further analyses on the biological annotation, subcellular localization, and network topology property. Thus, this study provided an effective method to define the upstream/downstream relations of interacting protein pairs and a powerful tool to unravel the unknown signaling pathways.
Tandem mass spectrometry combined with database searching allows high throughput identification of peptides in shotgun proteomics. However, validating database search results, a problem with a lot of solutions proposed, is still advancing in some aspects, such as the sensitivity, specificity, and generalizability of the validation algorithms. Here a Bayesian nonparametric (BNP) model for the validation of database search results was developed that incorporates several popular techniques in statistical learning, including the compression of feature space with a linear discriminant function, the flexible nonparametric probability density function estimation for the variable probability structure in complex problem, and the Bayesian method to calculate the posterior probability. Importantly the BNP model is compatible with the popular target-decoy database search strategy naturally. We tested the BNP model on standard proteins and real, complex sample data sets from multiple MS platforms and compared it with Peptide-Prophet, the cutoff-based method, and a simple nonparametric method (proposed by us previously). The performance of the BNP model was shown to be superior for all data sets searched on sensitivity and generalizability. Some high quality matches that had been filtered out by other methods were detected and assigned with high probability by the BNP model. Thus, the BNP model could be able to validate the database search results effectively and extract more information from MS/MS data.
The hybrid linear trap quadrupole Fourier-transform (LTQ-FT) ion cyclotron resonance mass spectrometer, an instrument with high accuracy and resolution, is widely used in the identification and quantification of peptides and proteins. However, time-dependent errors in the system may lead to deterioration of the accuracy of these instruments, negatively influencing the determination of the mass error tolerance (MET) in database searches. Here, a comprehensive discussion of LTQ/FT precursor ion mass error is provided. On the basis of an investigation of the mass error distribution, we propose an improved recalibration formula and introduce a new tool, FTDR (Fourier-transform data recalibration), that employs a graphic user interface (GUI) for automatic calibration. It was found that the calibration could adjust the mass error distribution to more closely approximate a normal distribution and reduce the standard deviation (SD). Consequently, we present a new strategy, LDSF (Large MET database search and small MET filtration), for database search MET specification and validation of database search results. As the name implies, a large-MET database search is conducted and the search results are then filtered using the statistical MET estimated from high-confidence results. By applying this strategy to a standard protein data set and a complex data set, we demonstrate the LDSF can significantly improve the sensitivity of the result validation procedure.
The relationship between sample loading amount and peptide identification is crucial for the optimization of proteomics experiments, but few studies have addressed this matter. Herein, we present a systematic study using a replicate run strategy to probe the inherent influence of both peptide physicochemical properties and matrix effects on the relationship between peptide identification and sample loading amounts, as well as its applications in protein quantification. Ten replicate runs for a series of laddered loading amounts (ranging between 0.01 approximately 10 microg) of total digested proteins from Saccharomyces cerevisiae were performed with nanoscale liquid chromatography coupled with linear ion trap/Fourier transform ion cyclotron resonance (nanoLC-LTQ-FT) to obtain a nearly saturated peptide identification. This permitted us to differentiate the linear correlativity of peptide identification by the commonly used peptide quantitative index, the area of constructed ion chromatograms (XIC) (SA, from MS and tandem MS data) in the given experiments. The absolute loading amount of a given complex sample affected the final qualitative identification result; thus, optimization of the sample loading amount before every proteomics study was essential. Peptide physicochemical properties had little effect on the linear correlativity between SA-based peptide quantification and loading amount. The matrix effects, rather than the static physicochemical properties of individual peptides, affect peptide measurability. We also quantified the target protein by selecting peptides with good parallel linear correlativity based upon SA as signature peptides and revised the data by multiplying by the reciprocal of the slope coefficient. We found that this optimized the linear protein abundance relativity at every amount range and thus extended the linear dynamic range of label-free quantification. This empirical rule for linear peptide selection (ERLPS) can be adopted to correct comparison results in proteolytic peptide-based quantitative proteomics, such as accurate mass tag (AMT) and targeted quantitative proteomics, as well as in tag-labeled comparative proteomics.
The online reversed-phase liquid chromatography (RPLC) contributes a lot for the large scale mass spectrometry based protein identification in proteomics. Retention time (RT) as an important evidence can be used to distinguish the false positive/true positive peptide identifications. Because of the nonlinear concentration curve of organic phase in the whole range of run time and the interactions among peptides, the sequence based RT prediction of peptides has low accuracy and is difficult to generalize in practice, and thus is less effective in the validation of peptide identifications. A serial and parallel support vector machine (SP-SVM) method was proposed to characterize the nonlinear effect of organic phase concentration and the interactions among peptides. The SP-SVM contains a support vector regression (SVR) only for model training (named as p-SVR) and 4 SVM models (named as C-SVM, 1-SVR, s-SVR and n-SVR) for the RT prediction. After distinguishing the peptide chromatographic behavior by C-SVM, 1-SVR and s-SVR were used to predict the peptide RT specifically to improve the accuracy. Then the peptide RT was normalized by n-SVR to characterize the peptide interactions. The prediction accuracy was improved significantly by applying this method to the processing of the complex sample dataset. The coefficient of the determination between predictive and experimental RTs reaches 0. 95, the prediction error range was less than 20% of the total LC run time for more than 95% cases, and less than 10% of the total LC run time for more than 70% cases. The performance of this model reaches the best of known so far. More important, the SP-SVM method provides a framework to take into account the interactions among peptides in chromatographic separation, and its performance can be improved further by introducing new data processing and experiment strategy.
Database searching based methods for label-free quantification aim to reconstruct the peptide extracted ion chromatogram based on the identification information, which can limit the search space and thus make the data processing much faster. The random effect of the MS/MS sampling can be remedied by cross-assignment among different runs. Here, we present a new label-free fast quantitative analysis tool, LFQuant, for high-resolution LC-MS/MS proteomics data based on database searching. It is designed to accept raw data in two common formats (mzXML and Thermo RAW), and database search results from mainstream tools (MASCOT, SEQUEST, and X!Tandem), as input data. LFQuant can handle large-scale label-free data with fractionation such as SDS-PAGE and 2D LC. It is easy to use and provides handy user interfaces for data loading, parameter setting, quantitative analysis, and quantitative data visualization. LFQuant was compared with two common quantification software packages, MaxQuant and IDEAL-Q, on the replication data set and the UPS1 standard data set. The results show that LFQuant performs better than them in terms of both precision and accuracy, and consumes significantly less processing time. LFQuant is freely available under the GNU General Public License v3.0 at http://sourceforge.net/projects/lfquant/.
Better quantification of nitrogen mineralization and nitrification after fumigation would indicate if any adjustment is needed in fertilizer application. The effects of chloropicrin (Pic), 1,3-dichloropropene (1,3-D), dimethyl disulfide (DMDS) and metham sodium (MS) fumigation on soil nitrogen dynamics were evaluated in lab incubation and field studies. Although some differences were observed in NH(4)(+)-N and NO(3)(-)-N concentrations in lab incubation and field experiments, both studies led to the same conclusions: (1) Soil fumigation was shown to increase soil mineral nitrogen only during the first 2 weeks after fumigation (WAF). In particular, Pic significantly increased soil mineral nitrogen in both studies at 1 WAF. However, for all fumigant treatments the observed effect was temporary; the soil mineral content of treated samples recovered to the general level observed in the untreated control. (2) All the fumigation treatments depressed nitrification temporarily, although the treatments exhibited significant differences in the duration of nitrification inhibition. In both studies, for a limited period of time, Pic showed a stronger inhibitory effect on nitrification compared to other fumigant treatments. An S-shaped function was fitted to the concentrations of NO(3)(-)-N in lab incubation samples. The times of maximum nitrification (t(max)) in DMDS and MS treatments were 0.97 week and 1.03 week, which is similar to the untreated control (t(max)=1.02 week). While Pic has the longest effect on nitrifying bacteria, nitrification appears to restart at a later time (t(max)=14.37 week).
Over the past decade, a rapid increase in network data including signaling, transcription regulation, metabolic reaction, protein-protein interaction and genetic interaction has been observed. Many biology issues have been investigated by analyzing these diverse networks, providing new insights into biology. Networks also play an important role in disease studies including disease gene screening and clinical diagnosis. Large amounts of databases and software have been developed to facilitate the storage, exchange, integration, and analysis of network data and network analysis is becoming a routine procedure for biologists to infer biological information. In this review, several main aspects of network studies are discussed, including network construction, analysis, application, and resources.
Gelatin capsules containing chloropicrin (Pic gel cap) were developed as a new formulation to reduce the potential human exposure risks associated with injection application methods. The objective of this study was to test the efficacy of a Pic gel cap formulation on soilborne pathogens and to determine the effects on strawberry plant growth and fruit yield. Three field experiments were conducted in strawberry greenhouses located in Mancheng County, China, in 2008-2010. The results demonstrated that effects of Pic gel cap on soilborne pathogens were similar to Pic injection; Pic gel cap effectively reduced key soilborne pathogens population, was partially effective against weeds, improved strawberry plant growth, and increased fruit yield significantly compared to the untreated control. Pic gel cap applied to preformed beds uses less fumigant than broadcast applications of Pic gel cap and can provide an equivalent level of disease control. The present study confirms that the Pic gel cap is a promising new formulation which provides field efficacy and marketable yields similar to Pic injection or methyl bromide in strawberry cultivation in China.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.