JoVE   
You do not have subscription access to articles in this section. Learn more about access.

  JoVE Biology

  
You do not have subscription access to articles in this section. Learn more about access.

  JoVE Neuroscience

  
You do not have subscription access to articles in this section. Learn more about access.

  JoVE Immunology and Infection

  
You do not have subscription access to articles in this section. Learn more about access.

  JoVE Clinical and Translational Medicine

  
You do not have subscription access to articles in this section. Learn more about access.

  JoVE Bioengineering

  
You do not have subscription access to articles in this section. Learn more about access.

  JoVE Applied Physics

  
You do not have subscription access to articles in this section. Learn more about access.

  JoVE Chemistry

  
You do not have subscription access to articles in this section. Learn more about access.

  JoVE Behavior

  
You do not have subscription access to articles in this section. Learn more about access.

  JoVE Environment

|   

JoVE Science Education

General Laboratory Techniques

You do not have subscription access to videos in this collection. Learn more about access.

Basic Methods in Cellular and Molecular Biology

You do not have subscription access to videos in this collection. Learn more about access.

Model Organisms I

You do not have subscription access to videos in this collection. Learn more about access.

Model Organisms II

You do not have subscription access to videos in this collection. Learn more about access.

Essentials of
Neuroscience

You do not have subscription access to videos in this collection. Learn more about access.

In JoVE (2)

Other Publications (131)

Articles by Andrew Emili in JoVE

 JoVE Biology

Mapping Bacterial Functional Networks and Pathways in Escherichia Coli using Synthetic Genetic Arrays

1Department of Molecular Genetics, University of Toronto, 2Banting and Best Department of Medical Research, Donnelly Centre, University of Toronto, 3Department of Biochemistry, Research and Innovation Centre, University of Regina


JoVE 4056

Systematic, large-scale synthetic genetic (gene-gene or epistasis) interaction screens can be used to explore genetic redundancy and pathway cross-talk. Here, we describe a high-throughput quantitative synthetic genetic array screening technology, termed eSGA that we developed for elucidating epistatic relationships and exploring genetic interaction networks in Escherichia coli.

 JoVE Biology

Identification of Protein Complexes in Escherichia coli using Sequential Peptide Affinity Purification in Combination with Tandem Mass Spectrometry

1Banting and Best Department of Medical Research, Donnelly Centre, University of Toronto, 2Deparment of Biochemistry, Research and Innovation Centre, University of Regina, 3Department of Medical Genetics and Microbiology, University of Toronto


JoVE 4057

Affinity purification of tagged proteins in combination with mass spectrometry (APMS) is a powerful method for the systematic mapping of protein interaction networks and for investigating the mechanistic basis of biological processes. Here, we describe an optimized sequential peptide affinity (SPA) APMS procedure developed for the bacterium Escherichia coli that can be used to isolate and characterize stable multi-protein complexes to near homogeneity even starting from low copy numbers per cell.

Other articles by Andrew Emili on PubMed

Multiple Substrates for Paraoxonase-1 During Oxidation of Phosphatidylcholine by Peroxynitrite

Paraoxonase (PON-1) is a high-density lipoprotein (HDL)-bound enzyme with activity toward multiple substrates. It hydrolyzes organic phosphate and aromatic carboxylic acid esters. It also inhibits accumulation of oxidized phospholipids in plasma lipoproteins by a mechanism yet to be determined. Therefore, we subjected apolipoprotein A-I proteoliposomes containing either 1-palmitoyl-2-linoleoyl-sn-glycero-3-phosphocholine or 1-palmitoyl-2-arachidonoyl-sn-glycero-3-phosphocholine to oxidation by a peroxynitrite generator, SIN-1, in the presence and absence of purified PON-1. PON-1 modified the proportion of oxidation products without affecting the overall extent of PC oxidation. However, in the presence of PON-1, phosphatidylcholine isoprostanes were hydrolyzed to lysophosphatidylcholine. In addition, PON-1 hydrolyzed the phosphatidylcholine core aldehydes 1-palmitoyl-2-(9-oxo)nonanoyl-sn-glycero-3-phosphocholine and 1-palmitoyl-2-(5-oxo)valeroyl-sn-glycero-3-phosphocholine to lysophosphatidylcholine. This hydrolysis was not affected by pefabloc, a serine esterase inhibitor. There was no detectable release of linoleate, arachidonate, or their hydroperoxy or hydroxy derivatives in the presence of PON-1. We conclude that PON-1 minimizes the accumulation of phosphatidylcholine oxidation products by the hydrolysis of phosphatidylcholine isoprostanes and core aldehydes to lysophosphatidylcholine with a serine esterase-independent mechanism.

De Novo Peptide Sequencing and Quantitative Profiling of Complex Protein Mixtures Using Mass-coded Abundance Tagging

Proteomic studies require efficient, robust, and practical methods of characterizing proteins present in biological samples. Here we describe an integrated strategy for systematic proteome analysis based on differential guanidination of C-terminal lysine residues on tryptic peptides followed by capillary liquid chromatography-electrospray tandem mass spectrometry. The approach, termed mass-coded abundance tagging (MCAT), facilitates the automated, large-scale, and comprehensive de novo determination of peptide sequence and relative quantitation of proteins in biological samples in a single analysis. MCAT offers marked advantages as compared with previously described methods and is simple, economic, and effective when applied to complex proteomic mixtures. MCAT is used to identify proteins, including polymorphic variants, from complex mixtures and measure variation in protein levels from diverse cell types.

RNA Polymerase II Elongation Factors of Saccharomyces Cerevisiae: a Targeted Proteomics Approach

To physically characterize the web of interactions connecting the Saccharomyces cerevisiae proteins suspected to be RNA polymerase II (RNAPII) elongation factors, subunits of Spt4/Spt5 and Spt16/Pob3 (corresponding to human DSIF and FACT), Spt6, TFIIF (Tfg1, -2, and -3), TFIIS, Rtf1, and Elongator (Elp1, -2, -3, -4, -5, and -6) were affinity purified under conditions designed to minimize loss of associated polypeptides and then identified by mass spectrometry. Spt16/Pob3 was discovered to associate with three distinct complexes: histones; Chd1/casein kinase II (CKII); and Rtf1, Paf1, Ctr9, Cdc73, and a previously uncharacterized protein, Leo1. Rtf1 and Chd1 have previously been implicated in the control of elongation, and the sensitivity to 6-azauracil of strains lacking Paf1, Cdc73, or Leo1 suggested that these proteins are involved in elongation by RNAPII as well. Confirmation came from chromatin immunoprecipitation (ChIP) assays demonstrating that all components of this complex, including Leo1, cross-linked to the promoter, coding region, and 3' end of the ADH1 gene. In contrast, the three subunits of TFIIF cross-linked only to the promoter-containing fragment of ADH1. Spt6 interacted with the uncharacterized, essential protein Iws1 (interacts with Spt6), and Spt5 interacted either with Spt4 or with a truncated form of Spt6. ChIP on Spt6 and the novel protein Iws1 resulted in the cross-linking of both proteins to all three regions of the ADH1 gene, suggesting that Iws1 is likely an Spt6-interacting elongation factor. Spt5, Spt6, and Iws1 are phosphorylated on consensus CKII sites in vivo, conceivably by the Chd1/CKII associated with Spt16/Pob3. All the elongation factors but Elongator copurified with RNAPII.

Splicing and Transcription-associated Proteins PSF and P54nrb/nonO Bind to the RNA Polymerase II CTD

The carboxyl-terminal domain (CTD) of the largest subunit of eukaryotic RNA polymerase II (pol II) plays an important role in promoting steps of pre-mRNA processing. To identify proteins in human cells that bind to the CTD and that could mediate its functions in pre-mRNA processing, we used the mouse CTD expressed in bacterial cells in affinity chromatography experiments. Two proteins present in HeLa cell extract, the splicing and transcription-associated factors, PSF and p54nrb/NonO, bound specifically and could be purified to virtual homogeneity by chromatography on immobilized CTD matrices. Both hypo- and hyperphosphorylated CTD matrices bound these proteins with similar selectivity. PSF and p54nrb/NonO also copurified with a holoenzyme form of pol II containing hypophosphorylated CTD and could be coimmunoprecipitated with antibodies specific for this and the hyperphosphorylated form of pol II. That PSF and p54nrb/NonO promoted the binding of RNA to immobilized CTD matrices suggested these proteins can interact with the CTD and RNA simultaneously. PSF and p54nrb/NonO may therefore provide a direct physical link between the pol II CTD and pre-mRNA processing components, at both the initiation and elongation phases of transcription.

Hsc70 Regulates Accumulation of Cyclin D1 and Cyclin D1-dependent Protein Kinase

The cyclin D-dependent kinase is a critical mediator of mitogen-dependent G1 phase progression in mammalian cells. Given the high incidence of cyclin D1 overexpression in human neoplasias, the nature and complexity of cyclin D complexes in vivo have been subjects of intense interest. Besides its catalytic partner, the nature and complexity of cyclin D complexes in vivo remain ambiguous. To address this issue, we purified native cyclin D1 complexes from proliferating mouse fibroblasts by affinity chromatography and began to identify and functionally characterize the associated proteins. In this report, we describe the identification of Hsc70 and its functional importance for cyclin D1 and cyclin D1-dependent kinase maturation. We demonstrate that Hsc70 associates with newly synthesized cyclin D1 and is a component of a mature, catalytically active cyclin D1/CDK4 holoenzyme complex. Our data suggest that Hsc70 promotes stabilization of newly synthesized cyclin D1, thereby increasing its availability for assembly with CDK4. In addition, our data demonstrate that Hsc70 remains bound to cyclin D1 following its assembly with CDK4 and Cip/Kip proteins, where it ensures the formation of a catalytically active complex.

JlpA of Campylobacter Jejuni Interacts with Surface-exposed Heat Shock Protein 90alpha and Triggers Signalling Pathways Leading to the Activation of NF-kappaB and P38 MAP Kinase in Epithelial Cells

Campylobacter jejuni is a leading cause of acute bacterial gastroenteritis in humans. The mechanism by which C. jejuni interacts with host cells, however, is still poorly understood. Our previous study has shown that the C. jejuni surface lipoprotein JlpA mediates adherence of the bacterium to epithelial cells. In this report, we demonstrated that JlpA interacts with HEp-2 cell surface heat shock protein (Hsp) 90alpha and initiates signalling pathways leading to activation of NF-kappaB and p38 MAP kinase. Gel overlay and GST pull down assays showed that JlpA interacts with Hsp90alpha. Geldanamycin, a specific inhibitor of Hsp90, and anti-human Hsp90alpha antibody significantly blocked the interaction between JlpA and Hsp90alpha, suggesting a direct interaction between JlpA and HEp-2 cell surface-exposed Hsp90alpha. The treatment of HEp-2 cells with GST-JlpA initiated two signalling pathways: one leading to the phosphorylation and degradation of IkappaB and nuclear translocation of NF-kappaB; and another one to the phosphorylation of p38 MAP kinase. The activation of NF-kappaB and p38 MAP kinase in HEp-2 cells suggest that JlpA triggers inflammatory/immune responses in host cells following C. jejuni infection.

PRISM, a Generic Large Scale Proteomic Investigation Strategy for Mammals

We have developed a systematic analytical approach, termed PRISM (Proteomic Investigation Strategy for Mammals), that permits routine, large scale protein expression profiling of mammalian cells and tissues. PRISM combines subcellular fractionation, multidimensional liquid chromatography-tandem mass spectrometry-based protein shotgun sequencing, and two newly developed computer algorithms, STATQUEST and GOClust, as a means to rapidly identify, annotate, and categorize thousands of expressed mammalian proteins. The application of PRISM to adult mouse lung and liver resulted in the high confidence identification of over 2,100 unique proteins including more than 100 integral membrane proteins, 400 nuclear proteins, and 500 uncharacterized proteins, the largest proteome study carried out to date on this important model organism. Automated clustering of the identified proteins into Gene Ontology annotation groups allowed for streamlined analysis of the large data set, revealing interesting and physiologically relevant patterns of tissue and organelle specificity. PRISM therefore offers an effective platform for in-depth investigation of complex mammalian proteomes.

Methylation of Histone H3 by Set2 in Saccharomyces Cerevisiae is Linked to Transcriptional Elongation by RNA Polymerase II

Set2 methylates Lys36 of histone H3. We show here that yeast Set2 copurifies with RNA polymerase II (RNAPII). Chromatin immunoprecipitation analyses demonstrated that Set2 and histone H3 Lys36 methylation are associated with the coding regions of several genes that were tested and correlate with active transcription. Both depend, as well, on the Paf1 elongation factor complex. The C terminus of Set2, which contains a WW domain, is also required for effective Lys36 methylation. Deletion of CTK1, encoding an RNAPII CTD kinase, prevents Lys36 methylation and Set2 recruitment, suggesting that methylation may be triggered by contact of the WW domain or C terminus of Set2 with Ser2-phosphorylated CTD. A set2 deletion results in slight sensitivity to 6-azauracil and much less beta-galactosidase produced by a reporter plasmid, resulting from a defect in transcription. In synthetic genetic array (SGA) analysis, synthetic growth defects were obtained when a set2 deletion was combined with deletions of all five components of the Paf1 complex, the chromodomain elongation factor Chd1, the putative elongation factor Soh1, the Bre1 or Lge1 components of the histone H2B ubiquitination complex, or the histone H2A variant Htz1. SET2 also interacts genetically with components of the Set1 and Set3 complexes, suggesting that Set1, Set2, and Set3 similarly affect transcription by RNAPII.

Mapping of Determinants Required for the Function of the HIV-1 Env Nuclear Retention Sequence

Control of HIV-1 RNA processing and transport are critical to the successful replication of the virus. In previous work, we identified a region within the HIV-1 env that is involved in mediating nuclear retention of unspliced viral RNA. To define this sequence further and identify elements required for function, deletion mutagenesis was carried out. Progressive 5' and 3' deletions map the nuclear retention sequence (NRS) within the intron between nts 8281 and 8381. While deletion of sequences comprising the 3'ss had no effect, removal of the 5'ss resulted in cytoplasmic accumulation of unspliced RNA. Sequence analysis determined that the region corresponding to the NRS is highly conserved among HIV-1 strains. To evaluate whether this NRS interacts with cellular factors, RNA electrophoretic mobility shift assays (REMSA) were performed. We show that the NRS specifically interacts with cellular factors present in HeLa nuclear extracts, and, by UV crosslinking, correlates with the binding of a 49-kDa protein. Immunoprecipitation of the UV crosslinked products determined that this 49-kDa protein corresponds to hnRNP C.

A Panoramic View of Yeast Noncoding RNA Processing

Predictive analysis using publicly available yeast functional genomics and proteomics data suggests that many more proteins may be involved in biogenesis of ribonucleoproteins than are currently known. Using a microarray that monitors abundance and processing of noncoding RNAs, we analyzed 468 yeast strains carrying mutations in protein-coding genes, most of which have not previously been associated with RNA or RNP synthesis. Many strains mutated in uncharacterized genes displayed aberrant noncoding RNA profiles. Ten factors involved in noncoding RNA biogenesis were verified by further experimentation, including a protein required for 20S pre-rRNA processing (Tsr2p), a protein associated with the nuclear exosome (Lrp1p), and a factor required for box C/D snoRNA accumulation (Bcd1p). These data present a global view of yeast noncoding RNA processing and confirm that many currently uncharacterized yeast proteins are involved in biogenesis of noncoding RNA.

Going Global: Protein Expression Profiling Using Shotgun Mass Spectrometry

Protein expression profiling, the science of monitoring global sets of proteins produced by any given cell type, tissue or organism, has been invigorated by the introduction of proteomic technologies capable of characterizing large numbers of proteins. This review summarizes recent advances in mass spectrometry-based techniques for high-throughput protein identification and quantitation that are fueling rapid growth in the field. Key publications applying state-of-the-art 'shotgun' methods for investigating the entire protein complement of whole organelles, cells and tissues are highlighted. An overview of current proteomic challenges, particularly in the area of data analysis, and the long-term prospects of protein profiling strategies in basic biomedical research, therapeutics development and clinical discovery is also provided.

In Silico Proteome Analysis to Facilitate Proteomics Experiments Using Mass Spectrometry

Proteomics experiments typically involve protein or peptide separation steps coupled to the identification of many hundreds to thousands of peptides by mass spectrometry. Development of methodology and instrumentation in this field is proceeding rapidly, and effective software is needed to link the different stages of proteomic analysis. We have developed an application, proteogest, written in Perl that generates descriptive and statistical analyses of the biophysical properties of multiple (e.g. thousands) protein sequences submitted by the user, for instance protein sequences inferred from the complete genome sequence of a model organism. The application also carries out in silico proteolytic digestion of the submitted proteomes, or subsets thereof, and the distribution of biophysical properties of the resulting peptides is presented. proteogest is customizable, the user being able to select many options, for instance the cleavage pattern of the digestion treatment or the presence of modifications to specific amino acid residues. We show how proteogest can be used to compare the proteomes and digested proteome products of model organisms, to examine the added complexity generated by modification of residues, and to facilitate the design of proteomics experiments for optimal representation of component proteins.

A Bayesian Networks Approach for Predicting Protein-protein Interactions from Genomic Data

We have developed an approach using Bayesian networks to predict protein-protein interactions genome-wide in yeast. Our method naturally weights and combines into reliable predictions genomic features only weakly associated with interaction (e.g., messenger RNAcoexpression, coessentiality, and colocalization). In addition to de novo predictions, it can integrate often noisy, experimental interaction data sets. We observe that at given levels of sensitivity, our predictions are more accurate than the existing high-throughput experimental data sets. We validate our predictions with TAP (tandem affinity purification) tagging experiments. Our analysis, which gives a comprehensive view of yeast interactions, is available at genecensus.org/intint.

A Snf2 Family ATPase Complex Required for Recruitment of the Histone H2A Variant Htz1

Deletions of three yeast genes, SET2, CDC73, and DST1, involved in transcriptional elongation and/or chromatin metabolism were used in conjunction with genetic array technology to screen approximately 4700 yeast deletions and identify double deletion mutants that produce synthetic growth defects. Of the five deletions interacting genetically with all three starting mutations, one encoded the histone H2A variant Htz1 and three encoded components of a novel 13 protein complex, SWR-C, containing the Snf2 family ATPase, Swr1. The SWR-C also copurified with Htz1 and Bdf1, a TFIID-interacting protein that recognizes acetylated histone tails. Deletions of the genes encoding Htz1 and seven nonessential SWR-C components caused a similar spectrum of synthetic growth defects when combined with deletions of 384 genes involved in transcription, suggesting that Htz1 and SWR-C belong to the same pathway. We show that recruitment of Htz1 to chromatin requires the SWR-C. Moreover, like Htz1 and Bdf1, the SWR-C promotes gene expression near silent heterochromatin.

Characterization of the Proteins Released from Activated Platelets Leads to Localization of Novel Platelet Proteins in Human Atherosclerotic Lesions

Proteins secreted by activated platelets can adhere to the vessel wall and promote the development of atherosclerosis and thrombosis. Despite this biologic significance, however, the complement of proteins comprising the platelet releasate is largely unknown. Using a proteomics approach, we have identified more than 300 proteins released by human platelets following thrombin activation. Many of the proteins identified were not previously attributed to platelets, including secretogranin III, a potential monocyte chemoattractant precursor; cyclophilin A, a vascular smooth muscle cell growth factor; calumenin, an inhibitor of the vitamin K epoxide reductase-warfarin interaction, as well as proteins of unknown function that map to expressed sequence tags. Secretogranin III, cyclophilin A, and calumenin were confirmed to localize in platelets and to be released upon activation. Furthermore, while absent in normal vasculature, they were identified in human atherosclerotic lesions. Therefore, these and other proteins released from platelets may contribute to atherosclerosis and to the thrombosis that complicates the disease. Moreover, as soluble extracellular proteins, they may prove suitable as novel therapeutic targets.

The Ctf13-30/CTF13 Genomic Haploinsufficiency Modifier Screen Identifies the Yeast Chromatin Remodeling Complex RSC, Which is Required for the Establishment of Sister Chromatid Cohesion

The budding yeast centromere-kinetochore complex ensures high-fidelity chromosome segregation in mitosis and meiosis by mediating the attachment and movement of chromosomes along spindle microtubules. To identify new genes and pathways whose function impinges on chromosome transmission, we developed a genomic haploinsufficiency modifier screen and used ctf13-30, encoding a mutant core kinetochore protein, as the reference point. We demonstrate through a series of secondary screens that the genomic modifier screen is a successful method for identifying genes that encode nonessential proteins required for the fidelity of chromosome segregation. One gene isolated in our screen was RSC2, a nonessential subunit of the RSC chromatin remodeling complex. rsc2 mutants have defects in both chromosome segregation and cohesion, but the localization of kinetochore proteins to centromeres is not affected. We determined that, in the absence of RSC2, cohesin could still associate with chromosomes but fails to achieve proper cohesion between sister chromatids, indicating that RSC has a role in the establishment of cohesion. In addition, numerous subunits of RSC were affinity purified and a new component of RSC, Rtt102, was identified. Our work indicates that only a subset of the nonessential RSC subunits function in maintaining chromosome transmission fidelity.

High-definition Macromolecular Composition of Yeast RNA-processing Complexes

A remarkably large collection of evolutionarily conserved proteins has been implicated in processing of noncoding RNAs and biogenesis of ribonucleoproteins. To better define the physical and functional relationships among these proteins and their cognate RNAs, we performed 165 highly stringent affinity purifications of known or predicted RNA-related proteins from Saccharomyces cerevisiae. We systematically identified and estimated the relative abundance of stably associated polypeptides and RNA species using a combination of gel densitometry, protein mass spectrometry, and oligonucleotide microarray hybridization. Ninety-two discrete proteins or protein complexes were identified comprising 489 different polypeptides, many associated with one or more specific RNA molecules. Some of the pre-rRNA-processing complexes that were obtained are discrete sub-complexes of those previously described. Among these, we identified the IPI complex required for proper processing of the ITS2 region of the ribosomal RNA primary transcript. This study provides a high-resolution overview of the modular topology of noncoding RNA-processing machinery.

Identification of Biochemical Adaptations in Hyper- or Hypocontractile Hearts from Phospholamban Mutant Mice by Expression Proteomics

Phospholamban (PLN) is a critical regulator of cardiac contractility through its binding to and regulation of the activity of the sarco(endo)plasmic reticulum Ca2+ ATPase. To uncover biochemical adaptations associated with extremes of cardiac muscle contractility, we used high-throughput gel-free tandem MS to monitor differences in the relative abundance of membrane proteins in standard microsomal fractions isolated from the hearts of PLN-null mice (PLN-KO) with high contractility and from transgenic mice overexpressing a superinhibitory PLN mutant in a PLN-null background (I40A-KO) with diminished contractility. Significant differential expression was detected for a subset of the 782 proteins identified, including known membrane-associated biomarkers, components of signaling pathways, and previously uninvestigated proteins. Proteins involved in fat and carbohydrate metabolism and proteins linked to G protein-signaling pathways activating protein kinase C were enriched in I40A-KO cardiac muscle, whereas proteins linked to enhanced contractile function were enriched in PLN-KO mutant hearts. These data demonstrate that Ca2+ dysregulation, leading to elevated or depressed cardiac contractility, induces compensatory biochemical responses.

Sequential Peptide Affinity (SPA) System for the Identification of Mammalian and Bacterial Protein Complexes

A vector system is described that combines reliable, very low level, regulated protein expression in human cells with two affinity purification tags (Sequential Peptide Affinity, or SPA, system). By avoiding overproduction of the target protein, this system allows for the efficient purification of natural protein complexes and their identification by mass spectrometry. We also present an adaptation of the SPA system for the efficient purification and identification of protein complexes in E. coli and, potentially, other bacteria.

Informatics Platform for Global Proteomic Profiling and Biomarker Discovery Using Liquid Chromatography-tandem Mass Spectrometry

We have developed an integrated suite of algorithms, statistical methods, and computer applications to support large-scale LC-MS-based gel-free shotgun profiling of complex protein mixtures using basic experimental procedures. The programs automatically detect and quantify large numbers of peptide peaks in feature-rich ion mass chromatograms, compensate for spurious fluctuations in peptide signal intensities and retention times, and reliably match related peaks across many different datasets. Application of this toolkit markedly facilitates pattern recognition and biomarker discovery in global comparative proteomic studies, simplifying mechanistic investigation of physiological responses and the detection of proteomic signatures of disease.

Regulation of Chromosome Stability by the Histone H2A Variant Htz1, the Swr1 Chromatin Remodeling Complex, and the Histone Acetyltransferase NuA4

NuA4, the only essential histone acetyltransferase complex in Saccharomyces cerevisiae, acetylates the N-terminal tails of histones H4 and H2A. Affinity purification of NuA4 revealed the presence of three previously undescribed subunits, Vid21/Eaf1/Ydr359c, Swc4/Eaf2/Ygr002c, and Eaf7/Ynl136w. Experimental analyses revealed at least two functionally distinct sets of polypeptides in NuA4: (i) Vid21 and Yng2, and (ii) Eaf5 and Eaf7. Vid21 and Yng2 are required for bulk histone H4 acetylation and are functionally linked to the histone H2A variant Htz1 and the Swr1 ATPase complex (SWR-C) that assembles Htz1 into chromatin, whereas Eaf5 and Eaf7 have a different, as yet undefined, role. Mutations in Htz1, the SWR-C, and NuA4 cause defects in chromosome segregation that are consistent with genetic interactions we have observed between the genes encoding these proteins and genes encoding kinetochore components. Because SWR-C-dependent recruitment of Htz1 occurs in both transcribed and centromeric regions, a NuA4/SWR-C/Htz1 pathway may regulate both transcription and centromere function in S. cerevisiae.

Sarcolipin Retention in the Endoplasmic Reticulum Depends on Its C-terminal RSYQY Sequence and Its Interaction with Sarco(endo)plasmic Ca(2+)-ATPases

Sarcolipin (SLN) and phospholamban (PLN) are effective inhibitors of the sarco(endo)plasmic reticulum Ca(2+)-ATPase (SERCA). These homologous proteins differ at their N and C termini: the C-terminal Met-Leu-Leu in PLN is replaced by Arg-Ser-Tyr-Gln-Tyr in SLN. The role of the C-terminal sequence of SLN tagged N-terminally with the FLAG epitope (NF-SLN) in endoplasmic reticulum (ER) retention was investigated by transfecting human embryonic kidney-293 cells with cDNAs encoding NF-SLN or a series of NF-SLN mutants in which C-terminal amino acids were deleted progressively. Immunofluorescence and immunoblotting of transfected cells by using anti-FLAG antibodies indicated that NF-SLN and PLN tagged at its N terminus with the FLAG epitope, even when overexpressed, were restricted to the ER. However, C-terminal truncation deletions of SLN, which lacked RSYQY, were not localized to ER and did not inhibit Ca(2+)-dependent Ca2+ uptake by SERCA. The shortest deletion constructs, NF-SLN 1-22 and NF-SLN 1-23, did not express stable protein products. However, all NF-SLN cDNA constructs, including NF-SLN 1-22 and NF-SLN 1-23, were expressed stably and localized to the ER when they were coexpressed with SERCA2a. These results show that NF-SLN subcellular distribution depends on SERCA coexpression and on its luminal, C-terminal RSYQY sequence. By using immunoprecipitation and MS, glucose-regulated protein 78/BiP and glucose-regulated protein 94 were identified as proteins that interact with NF-SLN through the RSYQY sequence. Thus, in the absence of SERCA, retention of NF-SLN in the ER is mediated through its association with other components through the C-terminal RSYQY sequence.

Proteasome Involvement in the Repair of DNA Double-strand Breaks

Affinity purification of the yeast 19S proteasome revealed the presence of Sem1 as a subunit. Its human homolog, DSS1, was found likewise to copurify with the human 19S proteasome. DSS1 is known to associate with the tumor suppressor protein BRCA2 involved in repair of DNA double-strand breaks (DSBs). We demonstrate that Sem1 is required for efficient repair of an HO-generated yeast DSB using both homologous recombination (HR) and nonhomologous end joining (NHEJ) pathways. Deletion of SEM1 or genes encoding other nonessential 19S or 20S proteasome subunits also results in synthetic growth defects and hypersensitivity to genotoxins when combined with mutations in well-established DNA DSB repair genes. Chromatin immunoprecipitation showed that Sem1 is recruited along with the 19S and 20S proteasomes to a DSB in vivo, and this recruitment is dependent on components of both the HR and NHEJ repair pathways, suggesting a direct role of the proteasome in DSB repair.

Modeling Protein Tandem Mass Spectrometry Data with an Extended Linear Regression Strategy

Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. We have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfitting and underfitting are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes.

A Role for SlyD in the Escherichia Coli Hydrogenase Biosynthetic Pathway

The [NiFe] centers at the active sites of the Escherichia coli hydrogenase enzymes are assembled by a team of accessory proteins that includes the products of the hyp genes. To determine whether any other proteins are involved in this process, the sequential peptide affinity system was used. The analysis of the proteins in a complex with HypB revealed the peptidyl-prolyl cis/trans-isomerase SlyD, a metal-binding protein that has not been previously linked to the hydrogenase biosynthetic pathway. The association between HypB and SlyD was confirmed by chemical cross-linking of purified proteins. Deletion of the slyD gene resulted in a marked reduction of the hydrogenase activity in cell extracts prepared from anaerobic cultures, and an in-gel assay was used to demonstrate diminished activities of both hydrogenase 1 and 2. Western analysis revealed a decrease in the final proteolytic processing of the hydrogenase 3 HycE protein, indicating that the metal center was not assembled properly. These deficiencies were all rescued by growth in medium containing excess nickel, but zinc did not have any phenotypic effect. Experiments with radioactive nickel demonstrated that less nickel accumulated in DeltaslyD cells compared with wild type, and overexpression of SlyD from an inducible promoter doubled the level of cellular nickel. These experiments demonstrate that SlyD has a role in the nickel insertion step of the hydrogenase maturation pathway, and the possible functions of SlyD are discussed.

Defining the SUMO-modified Proteome by Multiple Approaches in Saccharomyces Cerevisiae

SUMO, or Smt3 in Saccharomyces cerevisiae, is a ubiquitin-like protein that is post-translationally attached to multiple proteins in vivo. Many of these substrate modifications are cell cycle-regulated, and SUMO conjugation is essential for viability in most eukaryotes. However, only a limited number of SUMO-modified proteins have been definitively identified to date, and this has hampered study of the mechanisms by which SUMO ligation regulates specific cellular pathways. Here we use a combination of yeast two-hybrid screening, a high copy suppressor selection with a SUMO isopeptidase mutant, and tandem mass spectrometry to define a large set of proteins (>150) that can be modified by SUMO in budding yeast. These three approaches yielded overlapping sets of proteins with the most extensive set by far being those identified by mass spectrometry. The two-hybrid data also yielded a potential SUMO-binding motif. Functional categories of SUMO-modified proteins include SUMO conjugation system enzymes, chromatin- and gene silencing-related factors, DNA repair and genome stability proteins, stress-related proteins, transcription factors, proteins involved in translation and RNA metabolism, and a variety of metabolic enzymes. The results point to a surprisingly broad array of cellular processes regulated by SUMO conjugation and provide a starting point for detailed studies of how SUMO ligation contributes to these different regulatory mechanisms.

H2B Ubiquitin Protease Ubp8 and Sgf11 Constitute a Discrete Functional Module Within the Saccharomyces Cerevisiae SAGA Complex

The SAGA complex is a multisubunit protein complex involved in transcriptional regulation in Saccharomyces cerevisiae. SAGA combines proteins involved in interactions with DNA-bound activators and TATA-binding protein (TBP), as well as enzymes for histone acetylation (Gcn5) and histone deubiquitylation (Ubp8). We recently showed that H2B ubiquitylation and Ubp8-mediated deubiquitylation are both required for transcriptional activation. For this study, we investigated the interaction of Ubp8 with SAGA. Using mutagenesis, we identified a putative zinc (Zn) binding domain within Ubp8 as being critical for the association with SAGA. The Zn binding domain is required for H2B deubiquitylation and for growth on media requiring Ubp8's function in gene activation. Furthermore, we identified an 11-kDa subunit of SAGA, Sgf11, and showed that it is required for the Ubp8 association with SAGA and for H2B deubiquitylation. Different approaches indicated that the functions of Ubp8 and Sgf11 are related and separable from those of other components of SAGA. In particular, the profiles of Ubp8 and Sgf11 deletions were remarkably similar in microarray analyses and synthetic genetic interactions and were distinct from those of the Spt3 and Spt8 subunits of SAGA, which are involved in TBP regulation. These data indicate that Ubp8 and Sgf11 likely represent a new functional module within SAGA that is involved in gene regulation through H2B deubiquitylation.

Interaction Network Containing Conserved and Essential Protein Complexes in Escherichia Coli

Proteins often function as components of multi-subunit complexes. Despite its long history as a model organism, no large-scale analysis of protein complexes in Escherichia coli has yet been reported. To this end, we have targeted DNA cassettes into the E. coli chromosome to create carboxy-terminal, affinity-tagged alleles of 1,000 open reading frames (approximately 23% of the genome). A total of 857 proteins, including 198 of the most highly conserved, soluble non-ribosomal proteins essential in at least one bacterial species, were tagged successfully, whereas 648 could be purified to homogeneity and their interacting protein partners identified by mass spectrometry. An interaction network of protein complexes involved in diverse biological processes was uncovered and validated by sequential rounds of tagging and purification. This network includes many new interactions as well as interactions predicted based solely on genomic inference or limited phenotypic data. This study provides insight into the function of previously uncharacterized bacterial proteins and the overall topology of a microbial interaction network, the core components of which are broadly conserved across Prokaryota.

Integrating Gene and Protein Expression Data: Pattern Analysis and Profile Mining

Proteomics and functional genomics are emerging new research fields devoted to the study of the entire collection of proteins and mRNA transcripts (collectively known as gene products) that define a biological system. DNA microarrays are now a popular platform for measuring changes in messenger RNA transcript levels on a genome-wide scale, while gel-free shotgun profiling methods based on tandem mass spectrometry are increasingly being used to determine the identity, modification states, and relative abundance of large numbers of proteins. By defining the behavior of entire biological pathways and networks under various physiological states, these studies aim to extend traditional reductionist molecular genetic approaches regarding the biological roles of the vast array of uncharacterized gene products. A key goal is to determine how the information encoded by the myriad of expressed gene products is integrated at the molecular, cellular, and even whole organism level to create the dynamic biochemical processes and complex physiological controls that sustain life. While comparison of the complementary information contained in proteomic and mRNA data sets poses considerable analytical challenges, these efforts should provide added insight into the fundamental mechanisms underlying physiology, development, and the emergence of disease. Here, we outline several analytical approaches, methods, and tools that have proven to be helpful in the face of this important challenge.

Integrating Global Proteomic and Genomic Expression Profiles Generated from Islet Alpha Cells: Opportunities and Challenges to Deriving Reliable Biological Inferences

Systematic profiling of expressed gene products represents a promising research strategy for elucidating the molecular phenotypes of islet cells. To this end, we have combined complementary genomic and proteomic methods to better assess the molecular composition of murine pancreatic islet glucagon-producing alphaTC-1 cells as a model system, with the expectation of bypassing limitations inherent to either technology alone. Gene expression was measured with an Affymetrix MG_U74Av2 oligonucleotide array, while protein expression was examined by performing high-resolution gel-free shotgun MS/MS on a nuclear-enriched cell extract. Both analyses were carried out in triplicate to control for experimental variability. Using a stringent detection p value cutoff of 0.04, 48% of all potential mRNA transcripts were predicted to be expressed (probes classified as present in at least two of three replicates), while 1,651 proteins were identified with high-confidence using rigorous database searching. Although 762 of 888 cross-referenced cognate mRNA-protein pairs were jointly detected by both platforms, a sizeable number (126) of gene products was detected exclusively by MS alone. Conversely, marginal protein identifications often had convincing microarray support. Based on these findings, we present an operational framework for both interpreting and integrating dual genomic and proteomic datasets so as to obtain a more reliable perspective into islet alpha cell function.

Statistical and Computational Methods for Comparative Proteomic Profiling Using Liquid Chromatography-tandem Mass Spectrometry

The combined method of LC-MS/MS is increasingly being used to explore differences in the proteomic composition of complex biological systems. The reliability and utility of such comparative protein expression profiling studies is critically dependent on an accurate and rigorous assessment of quantitative changes in the relative abundance of the myriad of proteins typically present in a biological sample such as blood or tissue. In this review, we provide an overview of key statistical and computational issues relevant to bottom-up shotgun global proteomic analysis, with an emphasis on methods that can be applied to improve the dependability of biological inferences drawn from large proteomic datasets. Focusing on a start-to-finish approach, we address the following topics: 1) low-level data processing steps, such as formation of a data matrix, filtering, and baseline subtraction to minimize noise, 2) mid-level processing steps, such as data normalization, alignment in time, peak detection, peak quantification, peak matching, and error models, to facilitate profile comparisons; and, 3) high-level processing steps such as sample classification and biomarker discovery, and related topics such as significance testing, multiple testing, and choice of feature space. We report on approaches that have recently been developed for these steps, discussing their merits and limitations, and propose areas deserving of further research.

Navigating the Chaperone Network: an Integrative Map of Physical and Genetic Interactions Mediated by the Hsp90 Chaperone

Physical, genetic, and chemical-genetic interactions centered on the conserved chaperone Hsp90 were mapped at high resolution in yeast using systematic proteomic and genomic methods. Physical interactions were identified using genome-wide two hybrid screens combined with large-scale affinity purification of Hsp90-containing protein complexes. Genetic interactions were uncovered using synthetic genetic array technology and by a microarray-based chemical-genetic screen of a set of about 4700 viable yeast gene deletion mutants for hypersensitivity to the Hsp90 inhibitor geldanamycin. An extended network, consisting of 198 putative physical interactions and 451 putative genetic and chemical-genetic interactions, was found to connect Hsp90 to cofactors and substrates involved in a wide range of cellular functions. Two novel Hsp90 cofactors, Tah1 (YCR060W) and Pih1 (YHR034C), were also identified. These cofactors interact physically and functionally with the conserved AAA(+)-type DNA helicases Rvb1/Rvb2, which are key components of several chromatin remodeling factors, thereby linking Hsp90 to epigenetic gene regulation.

Proteome Dynamics During C2C12 Myoblast Differentiation

Mouse-derived C2C12 myoblasts serve as an experimentally tractable model system for investigating the molecular basis of skeletal muscle cell specification and development. To examine the biochemical adaptations associated with myocyte formation comprehensively, we used large scale gel-free tandem mass spectrometry to monitor global proteome alterations throughout a time course analysis of the myogenic C2C12 differentiation program. The relative abundance of approximately 1,800 high confidence proteins was tracked across multiple time points using capillary scale multidimensional liquid chromatography coupled to high throughput shotgun sequencing. Hierarchical clustering of the resulting profiles revealed differential waves of expression of proteins linked to intracellular signaling, transcription, cytoarchitecture, adhesion, metabolism, and muscle contraction across the early, mid, and late stages of differentiation. Several hundred previously uncharacterized proteins were likewise detected in a stage-specific manner, suggesting novel roles in myogenesis and/or muscle function. These proteomic data are complementary to recent microarray-based studies of gene expression patterns in developing myotubes and provide a holistic framework for understanding how diverse biochemical processes are coordinated at the cellular level during skeletal muscle development.

Global Protein Shotgun Expression Profiling of Proliferating Mcf-7 Breast Cancer Cells

Protein expression becomes altered in breast epithelium during malignant transformation. Knowledge of these perturbations should provide insight into the molecular basis of breast cancer, as well as reveal possible new therapeutic targets. To this end, we have performed an extensive comparative proteomic survey of global protein expression patterns in proliferating MCF-7 breast cancer cells and normal human mammary epithelial cells using gel-free shotgun tandem mass spectrometry. Pathophysiological alterations associated with the malignant breast cancer phenotype were detected, including differences in the apparent levels of key regulators of the cell cycle, signal transduction, apoptosis, transcriptional regulation, and cell metabolism.

Multidimensional Protein Identification Technology: Current Status and Future Prospects

Protein profiling using high-throughput tandem mass spectrometry has become a powerful method for analyzing changes in global protein expression patterns in cells and tissues as a function of developmental, physiologic and disease processes. This review summarizes the utility and practical application of multidimensional protein identification technology as a platform for comprehensive proteomic profiling of complex biologic samples. The strengths and potential problems and limitations associated with this powerful technology are discussed, with an emphasis placed on one of the biggest challenges currently facing large-scale expression profiling projects -- namely, data analysis. Complementary bioinformatic computational data mining strategies, such as clustering, functional annotation and statistical inference, are also discussed as these are increasingly necessary for interpreting the results of global proteomic profiling studies.

Multidimensional Protein Identification Technology (MudPIT): Technical Overview of a Profiling Method Optimized for the Comprehensive Proteomic Investigation of Normal and Diseased Heart Tissue

An optimized analytical expression profiling strategy based on gel-free multidimensional protein identification technology (MudPIT) is reported for the systematic investigation of biochemical (mal)-adaptations associated with healthy and diseased heart tissue. Enhanced shotgun proteomic detection coverage and improved biological inference is achieved by pre-fractionation of excised mouse cardiac muscle into subcellular components, with each organellar fraction investigated exhaustively using multiple repeat MudPIT analyses. Functional-enrichment, high-confidence identification, and relative quantification of hundreds of organelle- and tissue-specific proteins are achieved readily, including detection of low abundance transcriptional regulators, signaling factors, and proteins linked to cardiac disease. Important technical issues relating to data validation, including minimization of artifacts stemming from biased under-sampling and spurious false discovery, together with suggestions for further fine-tuning of sample preparation, are discussed. A framework for follow-up bioinformatic examination, pattern recognition, and data mining is also presented in the context of a stringent application of MudPIT for probing fundamental aspects of heart muscle physiology as well as the discovery of perturbations associated with heart failure.

Role for PSF in Mediating Transcriptional Activator-dependent Stimulation of Pre-mRNA Processing in Vivo

In a recent study, we provided evidence that strong promoter-bound transcriptional activators result in higher levels of splicing and 3'-end cleavage of nascent pre-mRNA than do weak promoter-bound activators and that this effect of strong activators requires the carboxyl-terminal domain (CTD) of RNA polymerase II (pol II). In the present study, we have investigated the mechanism of activator- and CTD-mediated stimulation of pre-mRNA processing. Affinity chromatography experiments reveal that two factors previously implicated in the coupling of transcription and pre-mRNA processing, PSF and p54(nrb)/NonO, preferentially bind a strong rather than a weak activation domain. Elevated expression in human 293 cells of PSF bypasses the requirement for a strong activator to promote efficient splicing and 3'-end cleavage. Truncation of the pol II CTD, which consists of 52 repeats of the consensus heptapeptide sequence YSPTSPS, to 15 heptapeptide repeats prevents PSF-dependent stimulation of splicing and 3'-end cleavage. Moreover, PSF and p54(nrb)/NonO bind in vitro to the wild-type CTD but not to the truncated 15-repeat CTD, and domains in PSF that are required for binding to activators and to the CTD are also important for the stimulation of pre-mRNA processing. Interestingly, activator- and CTD-dependent stimulation of splicing mediated by PSF appears to primarily affect the removal of first introns. Collectively, these results suggest that the recruitment of PSF to activated promoters and the pol II CTD provides a mechanism by which transcription and pre-mRNA processing are coordinated within the cell.

Proteomic Analysis of SRm160-containing Complexes Reveals a Conserved Association with Cohesin

In this study, we describe a rapid immunoaffinity purification procedure for gel-free tandem mass spectrometry-based analysis of endogenous protein complexes and apply it to the characterization of complexes containing the SRm160 (serine/arginine repeat-related nuclear matrix protein of 160 kDa) splicing coactivator. In addition to promoting splicing, SRm160 stimulates 3'-end processing via its N-terminal PWI nucleic acid-binding domain and is found in a post-splicing exon junction complex that has been implicated in coupling splicing with mRNA turnover, export, and translation. Consistent with these known functional associations, we found that the majority of proteins identified in SRm160-containing complexes are associated with pre-mRNA processing. Interestingly, SRm160 is also associated with factors involved in chromatin regulation and sister chromatid cohesion, specifically the cohesin subunits SMC1alpha, SMC3, RAD21, and SA2. Gradient fractionation suggested that there are two predominant SRm160-containing complexes, one enriched in splicing components and the other enriched in cohesin subunits. Co-immunoprecipitation and co-localization experiments, as well as combinatorial RNA interference in Caenorhabditis elegans, support the existence of conserved and functional interactions between SRm160 and cohesin.

A Skeleton of the Human Protein Interactome

In this issue of Cell, Wanker and colleagues (Stelzl et al., 2005) present a large-scale two-hybrid map of more than 3000 putative human protein-protein interactions. These new data will serve as an important source of information regarding individual protein partners and offer preliminary insight into the global molecular organization of human cells.

Uncovering Early Markers of Cardiac Disease by Proteomics: Avoiding (heart) Failure!

Human Tissue Profiling with Multidimensional Protein Identification Technology

Profiling of tissues and cell types through systematic characterization of expressed genes or proteins shows promise as a basic research tool, and has potential applications in disease diagnosis and classification. We used multidimensional protein identification protein identification technology (MudPIT) to analyze proteomes for enriched nuclear extracts of eight human tissues: brain, heart, liver, lung, muscle, pancreas, spleen, and testis. We show that the method is approximately 80% reproducible. We address issues of relative abundance, tissue-specificity, and selectivity, and the significance of proteins whose expression does not correlate with that of the corresponding mRNA. Surprisingly, most proteins are detected in a single tissue. These proteins tend to fulfill specialist (and potentially tissue-specific) functions compared to proteins expressed in two or more tissues.

Identification and Characterization of Elf1, a Conserved Transcription Elongation Factor in Saccharomyces Cerevisiae

In order to identify previously unknown transcription elongation factors, a genetic screen was carried out to identify mutations that cause lethality when combined with mutations in the genes encoding the elongation factors TFIIS and Spt6. This screen identified a mutation in YKL160W, hereafter named ELF1 (elongation factor 1). Further analysis identified synthetic lethality between an elf1Delta mutation and mutations in genes encoding several known elongation factors, including Spt4, Spt5, Spt6, and members of the Paf1 complex. Genome-wide synthetic lethality studies confirmed that elf1Delta specifically interacts with mutations in genes affecting transcription elongation. Chromatin immunoprecipitation experiments show that Elf1 is cotranscriptionally recruited over actively transcribed regions and that this association is partially dependent on Spt4 and Spt6. Analysis of elf1Delta mutants suggests a role for this factor in maintaining proper chromatin structure in regions of active transcription. Finally, purification of Elf1 suggests an association with casein kinase II, previously implicated in roles in transcription. Together, these results suggest an important role for Elf1 in the regulation of transcription elongation.

Cotranscriptional Set2 Methylation of Histone H3 Lysine 36 Recruits a Repressive Rpd3 Complex

The yeast histone deacetylase Rpd3 can be recruited to promoters to repress transcription initiation. Biochemical, genetic, and gene-expression analyses show that Rpd3 exists in two distinct complexes. The smaller complex, Rpd3C(S), shares Sin3 and Ume1 with Rpd3C(L) but contains the unique subunits Rco1 and Eaf3. Rpd3C(S) mutants exhibit phenotypes remarkably similar to those of Set2, a histone methyltransferase associated with elongating RNA polymerase II. Chromatin immunoprecipitation and biochemical experiments indicate that the chromodomain of Eaf3 recruits Rpd3C(S) to nucleosomes methylated by Set2 on histone H3 lysine 36, leading to deacetylation of transcribed regions. This pathway apparently acts to negatively regulate transcription because deleting the genes for Set2 or Rpd3C(S) bypasses the requirement for the positive elongation factor Bur1/Bur2.

Practical Proteomic Biomarker Discovery: Taking a Step Back to Leap Forward

There is a pressing need for radically improved proteomic screening methods that allow for earlier diagnosis of disease, for systematic monitoring of physiological responses and for uncovering the fundamental mechanisms of drug action. Recent developments in proteomic technology offer tremendous, yet untapped, potential to yield novel biomarkers that are translatable to routine clinical use. Despite the significant conceptual promise of comparative proteomic profiling as a research platform for biomarker discovery, however, major hurdles remain for practical and clinical implementation. In particular, there is growing recognition that rigorous experimental design principles are urgently required to validate conclusively the unproven methodologies currently being touted. Debate and confusion persist about where the burden of proof lies: statistically, biologically or clinically? Moreover, there is no consensus about what constitutes a meaningful benchmark. An important question is how to achieve a scientifically rigorous, and therefore convincing, proof-of-concept that can be accepted by the field. Key analytical challenges related to these issues that must be addressed by the burgeoning biomarker community are discussed here.

Slx4 Regulates DNA Damage Checkpoint-dependent Phosphorylation of the BRCT Domain Protein Rtt107/Esc4

RTT107 (ESC4, YHR154W) encodes a BRCA1 C-terminal-domain protein that is important for recovery from DNA damage during S phase. Rtt107 is a substrate of the checkpoint protein kinase Mec1, although the mechanism by which Rtt107 is targeted by Mec1 after checkpoint activation is currently unclear. Slx4, a component of the Slx1-Slx4 structure-specific nuclease, formed a complex with Rtt107. Deletion of SLX4 conferred many of the same DNA-repair defects observed in rtt107delta, including DNA damage sensitivity, prolonged DNA damage checkpoint activation, and increased spontaneous DNA damage. These phenotypes were not shared by the Slx4 binding partner Slx1, suggesting that the functions of the Slx4 and Slx1 proteins in the DNA damage response were not identical. Of particular interest, Slx4, but not Slx1, was required for phosphorylation of Rtt107 by Mec1 in vivo, indicating that Slx4 was a mediator of DNA damage-dependent phosphorylation of the checkpoint effector Rtt107. We propose that Slx4 has roles in the DNA damage response that are distinct from the function of Slx1-Slx4 in maintaining rDNA structure and that Slx4-dependent phosphorylation of Rtt107 by Mec1 is critical for replication restart after alkylation damage.

A Phosphatase Complex That Dephosphorylates GammaH2AX Regulates DNA Damage Checkpoint Recovery

One of the earliest marks of a double-strand break (DSB) in eukaryotes is serine phosphorylation of the histone variant H2AX at the carboxy-terminal SQE motif to create gammaH2AX-containing nucleosomes. Budding-yeast histone H2A is phosphorylated in a similar manner by the checkpoint kinases Tel1 and Mec1 (ref. 2; orthologous to mammalian ATM and ATR, respectively) over a 50-kilobase region surrounding the DSB. This modification is important for recruiting numerous DSB-recognition and repair factors to the break site, including DNA damage checkpoint proteins, chromatin remodellers and cohesins. Multiple mechanisms for eliminating gammaH2AX as DNA repair completes are possible, including removal by histone exchange followed potentially by degradation, or, alternatively, dephosphorylation. Here we describe a three-protein complex (HTP-C, for histone H2A phosphatase complex) containing the phosphatase Pph3 that regulates the phosphorylation status of gammaH2AX in vivo and efficiently dephosphorylates gammaH2AX in vitro. gammaH2AX is lost from chromatin surrounding a DSB independently of the HTP-C, indicating that the phosphatase targets gammaH2AX after its displacement from DNA. The dephosphorylation of gammaH2AX by the HTP-C is necessary for efficient recovery from the DNA damage checkpoint.

Formation of a Distinctive Complex Between the Inducible Bacterial Lysine Decarboxylase and a Novel AAA+ ATPase

AAA+ ATPases are ubiquitous proteins that employ the energy obtained from ATP hydrolysis to remodel proteins, DNA, or RNA. The MoxR family of AAA+ proteins is widespread throughout bacteria and archaea but is largely uncharacterized. Limited work with specific members has suggested a potential role as molecular chaperones involved in the assembly of protein complexes. As part of an effort aimed at determining the function of novel AAA+ chaperones in Escherichia coli, we report the characterization of a representative member of the MoxR family, YieN, which we have renamed RavA (regulatory ATPase variant A). We show that the ravA gene exists on an operon with another gene encoding a protein, YieM, of unknown function containing a Von Willebrand Factor Type A domain. RavA expression is under the control of the sigmaS transcription factor, and its levels increase toward late log/early stationary phase, consistent with its possible role as a general stress-response protein. RavA functions as an ATPase and forms hexameric oligomers. Importantly, we demonstrate that RavA interacts strongly with inducible lysine decarboxylase (LdcI or CadA) forming a large cage-like structure consisting of two LdcI decamers linked by a maximum of five RavA oligomers. Surprisingly, the activity of LdcI does not appear to be affected by binding to RavA in a number of in vitro and in vivo assays, however, complex formation results in the stimulation of RavA ATPase activity. Data obtained suggest that the RavA-LdcI interaction may be important for the regulation of RavA activity against its targets.

Interactions of the Escherichia Coli Hydrogenase Biosynthetic Proteins: HybG Complex Formation

Assembly of the active site of the [NiFe]-hydrogenase enzymes involves a multi-step pathway and the coordinated activity of many accessory proteins. To analyze complex formation between these factors in Escherichia coli, they were genomically tagged and native multi-protein complexes were isolated. This method validated multiple interactions reported in separate studies from several organisms and defined a new complex containing the putative chaperone HybG and the large subunit of hydrogenase 1 or 2. The complex also includes HypE and HypD, which interact with each other before joining the larger complex.

Cardiac-specific Overexpression of Sarcolipin in Phospholamban Null Mice Impairs Myocyte Function That is Restored by Phosphorylation

Sarcolipin (SLN) inhibits the cardiac sarco(endo)plasmic reticulum Ca2+ ATPase (SERCA2a) by direct binding and is superinhibitory if it binds as a binary complex with phospholamban (PLN). To demonstrate whether overexpression of SLN in the heart might impair cardiac function directly, transgenic (TG) mice with cardiac-specific overexpression of NF-SLN (SLN tagged at its N terminus with the FLAG epitope) were generated on a phospholamban (PLN) null (PLN KO) background. In NF-SLN TG/PLN KO cardiac microsomes, the apparent affinity of SERCA2a for Ca2+ was decreased compared with non-TG littermate PLN KO hearts. Analyses of isolated NF-SLN/PLN KO cardiomyocytes revealed impaired cardiac contractility, reduced calcium transient peak amplitude, and slower decay kinetics compared to PLN KO animals. In these cardiomyocytes, isoproterenol restored calcium dynamics to the levels seen in PLN KO. Invasive hemodynamic and echocardiographic analyses of NF-SLN/PLN KO mouse cardiac muscle in vivo showed no direct effects of NF-SLN overexpression when compared to PLN KO mice. A possible mechanism for the lack of effects in the whole heart may be a responsiveness to phosphorylation because we determined that NF-SLN can be phosphorylated in cardiomyocytes in response to isoproterenol, and we provide evidence that serine/threonine kinase 16 is a kinase that can phosphorylate NF-SLN. Site-directed mutagenesis showed that SLN Thr-5 is the target site for this kinase. These data show that overexpression of NF-SLN can inhibit SERCA2a in the absence of PLN and that the inhibition of SERCA2a is correlated with impairment of contractility and calcium cycling in cardiomyocytes.

Global Landscape of Protein Complexes in the Yeast Saccharomyces Cerevisiae

Identification of protein-protein interactions often provides insight into protein function, and many cellular processes are performed by stable protein complexes. We used tandem affinity purification to process 4,562 different tagged proteins of the yeast Saccharomyces cerevisiae. Each preparation was analysed by both matrix-assisted laser desorption/ionization-time of flight mass spectrometry and liquid chromatography tandem mass spectrometry to increase coverage and accuracy. Machine learning was used to integrate the mass spectrometry scores and assign probabilities to the protein-protein interactions. Among 4,087 different proteins identified with high confidence by mass spectrometry from 2,357 successful purifications, our core data set (median precision of 0.69) comprises 7,123 protein-protein interactions involving 2,708 proteins. A Markov clustering algorithm organized these interactions into 547 protein complexes averaging 4.9 subunits per complex, about half of them absent from the MIPS database, as well as 429 additional interactions between pairs of complexes. The data (all of which are available online) will help future studies on individual proteins as well as functional genomics and systems biology.

A Genome-wide Screen Identifies the Evolutionarily Conserved KEOPS Complex As a Telomere Regulator

Telomere capping is the essential function of telomeres. To identify new genes involved in telomere capping, we carried out a genome-wide screen in Saccharomyces cerevisiae for suppressors of cdc13-1, an allele of the telomere-capping protein Cdc13. We report the identification of five novel suppressors, including the previously uncharacterized gene YML036W, which we name CGI121. Cgi121 is part of a conserved protein complex -- the KEOPS complex -- containing the protein kinase Bud32, the putative peptidase Kae1, and the uncharacterized protein Gon7. Deletion of CGI121 suppresses cdc13-1 via the dramatic reduction in ssDNA levels that accumulate in cdc13-1 cgi121 mutants. Deletion of BUD32 or other KEOPS components leads to short telomeres and a failure to add telomeres de novo to DNA double-strand breaks. Our results therefore indicate that the KEOPS complex promotes both telomere uncapping and telomere elongation.

Global Survey of Organ and Organelle Protein Expression in Mouse: Combined Proteomic and Transcriptomic Profiling

Organs and organelles represent core biological systems in mammals, but the diversity in protein composition remains unclear. Here, we combine subcellular fractionation with exhaustive tandem mass spectrometry-based shotgun sequencing to examine the protein content of four major organellar compartments (cytosol, membranes [microsomes], mitochondria, and nuclei) in six organs (brain, heart, kidney, liver, lung, and placenta) of the laboratory mouse, Mus musculus. Using rigorous statistical filtering and machine-learning methods, the subcellular localization of 3274 of the 4768 proteins identified was determined with high confidence, including 1503 previously uncharacterized factors, while tissue selectivity was evaluated by comparison to previously reported mRNA expression patterns. This molecular compendium, fully accessible via a searchable web-browser interface, serves as a reliable reference of the expressed tissue and organelle proteomes of a leading model mammal.

Systems-level Analyses Identify Extensive Coupling Among Gene Expression Machines

Here, we develop computational methods to assess and consolidate large, diverse protein interaction data sets, with the objective of identifying proteins involved in the coupling of multicomponent complexes within the yeast gene expression pathway. From among approximately 43 000 total interactions and 2100 proteins, our methods identify known structural complexes, such as the spliceosome and SAGA, and functional modules, such as the DEAD-box helicases, within the interaction network of proteins involved in gene expression. Our process identifies and ranks instances of three distinct, biologically motivated motifs, or patterns of coupling among distinct machineries involved in different subprocesses of gene expression. Our results confirm known coupling among transcription, RNA processing, and export, and predict further coupling with translation and nonsense-mediated decay. We systematically corroborate our analysis with two independent, comprehensive experimental data sets. The methods presented here may be generalized to other biological processes and organisms to generate principled, systems-level network models that provide experimentally testable hypotheses for coupling among biological machines.

PIPE: a Protein-protein Interaction Prediction Engine Based on the Re-occurring Short Polypeptide Sequences Between Known Interacting Protein Pairs

Identification of protein interaction networks has received considerable attention in the post-genomic era. The currently available biochemical approaches used to detect protein-protein interactions are all time and labour intensive. Consequently there is a growing need for the development of computational tools that are capable of effectively identifying such interactions.

Cardiovascular Proteomics: Tools to Develop Novel Biomarkers and Potential Applications

Proteomics is the new systems biological approach to the study of proteins and protein variation on a large scale as a result of biological processes and perturbations. The field is undergoing a dramatic transformation, owing to the completion and annotation of the human genome as well as technological advances to study proteins on a large scale. The new science of proteomics can potentially yield novel biomarkers reflecting cardiovascular disease, establish earlier detection strategies, and monitor responses to therapy. Technological advances permit the unprecedented large-scale identification of peptide sequences in a biological sample with mass spectrometry, whereas gel-based techniques provide further refinement on the status of post-translational modification. The application of high throughput protein evaluation with a subset of predefined targets, identified through proteomics, microarray profiling, and pathway analysis in animal models and human tissues, is gaining momentum in research and clinical applications. Proteomic analysis has provided important insights into ischemic heart disease, heart failure, and cardiovascular pathophysiology. The combination of proteomic biomarkers with clinical phenotypes and genetic haplotype information can lead to a more precise diagnosis and therapy on an individual basis--the fundamental premise of "personalized medicine."

Tissue Subcellular Fractionation and Protein Extraction for Use in Mass-spectrometry-based Proteomics

We have shown that sample fractionation is an effective method for increasing the detection coverage of the proteome of complex samples, such as organs, by mass-spectrometric techniques. Further fractionating a sample based on subcellular compartments can generate molecular information on the state of a tissue and the distribution of its protein components. Although many methods exist for fractionating proteins, the method described here can capture the majority of subcellular fractions simultaneously at reasonable purity. The scalability of this method makes it amenable to small samples, such as embryonic tissues, in addition to larger tissues. The protocol described is for the general fractionation and extraction of proteins from organs or tissues for subsequent analysis by mass spectrometry. It uses differential centrifugation in density gradients to isolate nuclear, cytosolic, mitochondrial and mixed microsomal (Golgi, endoplasmic reticulum, other vesicles and plasma membrane) fractions. Once the fractions are isolated, they are extracted for protein and the samples can then be frozen for processing and analysis at a later date. The procedure can typically be completed in 5 h.

Investigating the in Vivo Activity of the DeaD Protein Using Protein-protein Interactions and the Translational Activity of Structured Chloramphenicol Acetyltransferase MRNAs

Here, we report the use of an in vivo protein-protein interaction detection approach together with focused follow-up experiments to study the function of the DeaD protein in Escherichia coli. In this method, functions are assigned to proteins based on the interactions they make with others in the living cell. The assigned functions are further confirmed using follow-up experiments. The DeaD protein has been characterized in vitro as a putative prokaryotic factor required for the formation of translation initiation complexes on structured mRNAs. Although the RNA helicase activity of DeaD has been demonstrated in vitro, its in vivo activity remains controversial. Here, using a method called sequential peptide affinity (SPA) tagging, we show that DeaD interacts with certain ribosomal proteins as well as a series of other nucleic acid binding proteins. Focused follow-up experiments provide evidence for the mRNA helicase activity of the DeaD protein complex during translation initiation. DeaD overexpression compensates for the reduction of the translation activity caused by a structure placed at the initiation region of a chloramphenicol acetyltransferase gene (cat) used as a reporter. Deletion of the deaD gene, encoding DeaD, abolishes the translation activity of the mRNA with an inhibitory structure at its initiation region. Increasing the growth temperature disrupts RNA secondary structures and bypasses the DeaD requirement. These observations suggest that DeaD is involved in destabilizing mRNA structures during translation initiation. This study also provides further confirmation that large-scale protein-protein interaction data can be suitable to study protein functions in E. coli.

Computational Prediction of Cancer-gene Function

Most cancer genes remain functionally uncharacterized in the physiological context of disease development. High-throughput molecular profiling and interaction studies are increasingly being used to identify clusters of functionally linked gene products related to neoplastic cell processes. However, in vivo determination of cancer-gene function is laborious and inefficient, so accurately predicting cancer-gene function is a significant challenge for oncologists and computational biologists alike. How can modern computational and statistical methods be used to reliably deduce the function(s) of poorly characterized cancer genes from the newly available genomic and proteomic datasets? We explore plausible solutions to this important challenge.

Analyzing the Cardiac Muscle Proteome by Liquid Chromatography-mass Spectrometry-based Expression Proteomics

Cardiomyopathies are diseases of the heart resulting in impaired cardiac muscle function, which can lead to heart dilation or overt heart failure. These diseases represent a major cause of global morbidity and death. Innovative preventive and therapeutic measures are urgently needed for early detection, categorization, and treatment of patients at risk of cardiomyopathy. These developments will require a more complete understanding of the molecular effects of impaired cardiac function, even prior to overt disease. The use of gel-free expression proteomics in the detailed analysis of cardiac tissues should yield significant insight into the pathophysiology of these diseases.

Difference Detection in LC-MS Data for Protein Biomarker Discovery

There is a pressing need for improved proteomic screening methods allowing for earlier diagnosis of disease, systematic monitoring of physiological responses and the uncovering of fundamental mechanisms of drug action. The combined platform of LC-MS (Liquid-Chromatography-Mass-Spectrometry) has shown promise in moving toward a solution in these areas. In this paper we present a technique for discovering differences in protein signal between two classes of samples of LC-MS serum proteomic data without use of tandem mass spectrometry, gels or labeling. This method works on data from a lower-precision MS instrument, the type routinely used by and available to the community at large today. We test our technique on a controlled (spike-in) but realistic (serum biomarker discovery) experiment which is therefore verifiable. We also develop a new method for helping to assess the difficulty of a given spike-in problem. Lastly, we show that the problem of class prediction, sometimes mistaken as a solution to biomarker discovery, is actually a much simpler problem.

Functional Dissection of Protein Complexes Involved in Yeast Chromosome Biology Using a Genetic Interaction Map

Defining the functional relationships between proteins is critical for understanding virtually all aspects of cell biology. Large-scale identification of protein complexes has provided one important step towards this goal; however, even knowledge of the stoichiometry, affinity and lifetime of every protein-protein interaction would not reveal the functional relationships between and within such complexes. Genetic interactions can provide functional information that is largely invisible to protein-protein interaction data sets. Here we present an epistatic miniarray profile (E-MAP) consisting of quantitative pairwise measurements of the genetic interactions between 743 Saccharomyces cerevisiae genes involved in various aspects of chromosome biology (including DNA replication/repair, chromatid segregation and transcriptional regulation). This E-MAP reveals that physical interactions fall into two well-represented classes distinguished by whether or not the individual proteins act coherently to carry out a common function. Thus, genetic interaction data make it possible to dissect functionally multi-protein complexes, including Mediator, and to organize distinct protein complexes into pathways. In one pathway defined here, we show that Rtt109 is the founding member of a novel class of histone acetyltransferases responsible for Asf1-dependent acetylation of histone H3 on lysine 56. This modification, in turn, enables a ubiquitin ligase complex containing the cullin Rtt101 to ensure genomic integrity during DNA replication.

Identifying Functional Modules in the Physical Interactome of Saccharomyces Cerevisiae

Reliable information on the physical and functional interactions between the gene products is an important prerequisite for deriving meaningful system-level descriptions of cellular processes. The available information about protein interactions in Saccharomyces cerevisiae has been vastly increased recently by two comprehensive tandem affinity purification/mass spectrometry (TAP/MS) studies. However, using somewhat different approaches, these studies produced diverging descriptions of the yeast interactome, clearly illustrating the fact that converting the purification data into accurate sets of protein-protein interactions and complexes remains a major challenge. Here, we review the major analytical steps involved in this process, with special focus on the task of deriving complexes from the network of binary interactions. Applying the Markov Cluster procedure to an alternative yeast interaction network, recently derived by combining the data from the two latest TAP/MS studies, we produce a new description of yeast protein complexes. Several objective criteria suggest that this new description is more accurate and meaningful than those previously published. The same criteria are also used to gauge the influence that different methods for deriving binary interactions and complexes may have on the results. Lastly, it is shown that employing identical procedures to process the latest purification datasets significantly improves the convergence between the resulting interactome descriptions.

Hardware Flexibility of Laboratory Automation Systems: Analysis and New Flexible Automation Architectures

Development of flexible laboratory automation systems has attracted tremendous attention in recent years as biotechnology scientists perform diverse types of protocols and tend to continuously modify them as part of their research. This article is a system level study of hardware flexibility of laboratory automation architectures for high-throughput automation of various sample preparation protocols. Hardware flexibility (system components' adaptability to protocol variations) of automation systems is addressed through the introduction of three main parametric flexibility measures functional, structural, and throughput. A new quantitative measurement method for these parameters in the realm of the Axiomatic Theory is introduced in this article. The method relies on defining probability of success functions for flexibility parameters and calculating their information contents. As flexibility information content decreases, automation system flexibility increases.

Retention of Protein Complex Membership by Ancient Duplicated Gene Products in Budding Yeast

To investigate functional divergence of gene duplicates, we examined the protein-protein interactions and coexistence in complexes of paralogs resulting from an ancient whole-genome duplication in yeast. Strikingly, half the surveyed paralog pairs were found to be co-clustered in protein complexes, and were more conserved and highly expressed than non-co-clustered paralogs; however, their discordant expression patterns and conservation rates indicate differential regulation of subfunctionalized paralogs. These results highlight the value of protein complex membership in studying functional divergence among gene duplicates.

Beta-Subunit Appendages Promote 20S Proteasome Assembly by Overcoming an Ump1-dependent Checkpoint

Proteasomes are responsible for most intracellular protein degradation in eukaryotes. The 20S proteasome comprises a dyad-symmetric stack of four heptameric rings made from 14 distinct subunits. How it assembles is not understood. Most subunits in the central pair of beta-subunit rings are synthesized in precursor form. Normally, the beta5 (Doa3) propeptide is essential for yeast proteasome biogenesis, but overproduction of beta7 (Pre4) bypasses this requirement. Bypass depends on a unique beta7 extension, which contacts the opposing beta ring. The resulting proteasomes appear normal but assemble inefficiently, facilitating identification of assembly intermediates. Assembly occurs stepwise into precursor dimers, and intermediates contain the Ump1 assembly factor and a novel complex, Pba1-Pba2. beta7 incorporation occurs late and is closely linked to the association of two half-proteasomes. We propose that dimerization is normally driven by the beta5 propeptide, an intramolecular chaperone, but beta7 addition overcomes an Ump1-dependent assembly checkpoint and stabilizes the precursor dimer.

HnRNP E1 and E2 Have Distinct Roles in Modulating HIV-1 Gene Expression

Pre-mRNA processing, including 5' end capping, splicing, and 3' end cleavage/polyadenylation, are events coordinated by transcription that can influence the subsequent export and translation of mRNAs. Coordination of RNA processing is crucial in retroviruses such as HIV-1, where inefficient splicing and the export of intron-containing RNAs are required for expression of the full complement of viral proteins. RNA processing can be affected by both viral and cellular proteins, and in this study we demonstrate that a member of the hnRNP E family of proteins can modulate HIV-1 RNA metabolism and expression. We show that hnRNP E1/E2 are able to interact with the ESS3a element of the bipartite ESS in tat/rev exon 3 of HIV-1 and that modulation of hnRNP E1 expression alters HIV-1 structural protein synthesis. Overexpression of hnRNP E1 leads to a reduction in Rev, achieved in part through a decrease in rev mRNA levels. However, the reduction in Rev levels cannot fully account for the effect of hnRNP E1, suggesting that hmRNP E1 might also act to suppress viral RNA translation. Deletion mutagenesis determined that the C-terminal end of hnRNP E1 was required for the reduction in Rev expression and that replacing this portion of hnRNP E1 with that of hnRNP E2, despite the high degree of conservation, could not rescue the loss of function.

Integrated Proteomic and Transcriptomic Profiling of Mouse Lung Development and Nmyc Target Genes

Although microarray analysis has provided information regarding the dynamics of gene expression during development of the mouse lung, no extensive correlations have been made to the levels of corresponding protein products. Here, we present a global survey of protein expression during mouse lung organogenesis from embryonic day E13.5 until adulthood using gel-free two-dimensional liquid chromatography coupled to shotgun tandem mass spectrometry (MudPIT). Mathematical modeling of the proteomic profiles with parallel DNA microarray data identified large groups of gene products with statistically significant correlation or divergence in coregulation of protein and transcript levels during lung development. We also present an integrative analysis of mRNA and protein expression in Nmyc loss- and gain-of-function mutants. This revealed a set of 90 positively and negatively regulated putative target genes. These targets are evidence that Nmyc is a regulator of genes involved in mRNA processing and a repressor of the imprinted gene Igf2r in the developing lung.

Experimental and Computational Procedures for the Assessment of Protein Complexes on a Genome-wide Scale

Predict, Prevent and Personalize: Genomic and Proteomic Approaches to Cardiovascular Medicine

Genomic and proteomic approaches to cardiovascular medicine promise to revolutionize our understanding of disease initiation and progression. This improved appreciation of pathophysiology may be translated into avenues of clinical utility. Gene-based presymptomatic prediction of illness, finer diagnostic subclassifications and improved risk assessment tools will permit earlier and more targeted intervention. Pharmacogenetics will guide our therapeutic decisions and monitor response to therapy. Personalized medicine will require the integration of clinical information, stable and dynamic genomics, and molecular phenotyping. Bioinformatics will be crucial in translating these data into useful applications, leading to improved diagnosis, prediction, prognostication and treatment. The present paper reviews the potential contributions of genomic and proteomic approaches in developing a more personalized approach to cardiovascular medicine.

Association with the Origin Recognition Complex Suggests a Novel Role for Histone Acetyltransferase Hat1p/Hat2p

Histone modifications have been implicated in the regulation of transcription and, more recently, in DNA replication and repair. In yeast, a major conserved histone acetyltransferase, Hat1p, preferentially acetylates lysine residues 5 and 12 on histone H4.

Impaired TRNA Nuclear Export Links DNA Damage and Cell-cycle Checkpoint

In response to genotoxic stress, cells evoke a plethora of physiological responses collectively aimed at enhancing viability and maintaining the integrity of the genome. Here, we report that unspliced tRNA rapidly accumulates in the nuclei of yeast Saccharomyces cerevisiae after DNA damage. This response requires an intact MEC1- and RAD53-dependent signaling pathway that impedes the nuclear export of intron-containing tRNA via differential relocalization of the karyopherin Los1 to the cytoplasm. The accumulation of unspliced tRNA in the nucleus signals the activation of Gcn4 transcription factor, which, in turn, contributes to cell-cycle arrest in G1 in part by delaying accumulation of the cyclin Cln2. The regulated nucleocytoplasmic tRNA trafficking thus constitutes an integral physiological adaptation to DNA damage. These data further illustrate how signal-mediated crosstalk between distinct functional modules, namely, tRNA nucleocytoplasmic trafficking, protein synthesis, and checkpoint execution, allows for functional coupling of tRNA biogenesis and cell-cycle progression.

Nine Steps to Proteomic Wisdom: A Practical Guide to Using Protein-protein Interaction Networks and Molecular Pathways As a Framework for Interpreting Disease Proteomic Profiles

A major aim of proteomic profiling of disease is to uncover the mechanistic basis of a given pathology. High-throughput experimental techniques continue to advance rapidly, but are still plagued by high rates of false negatives, false positives, and other spurious findings. By reducing a disease profile to a subset of differentially expressed proteins and determining functional over-representation, one can often make a reasonable first-pass assessment as to what might be happening in disease. Integrating mRNA expression patterns together with prior knowledge of protein-protein interaction networks and biological pathway information goes a step further, providing clues into the core processes that are aberrant in the disease state, and indicating which cellular functions are activated or repressed as a maladaptive pathophysiological response. This multi-step framework allows one to hypothesize as to possible cause and effect of pathology, and highlights potentially instructive pathways or sub-networks for subsequent experimental validation. Indeed, efficiently exploiting data regarding the myriad of physical and genetic interactions among expressed gene products, in parallel with the systematic sampling of genetic variation among diverse human populations, promises to revolutionize our current understanding of disease action at a deeper molecular level.

Bacteriome.org--an Integrated Protein Interaction Database for E. Coli

High throughput methods are increasingly being used to examine the functions and interactions of gene products on a genome-scale. These include systematic large-scale proteomic studies of protein complexes and protein-protein interaction networks, functional genomic studies examining patterns of gene expression and comparative genomics studies examining patterns of conservation. Since these datasets offer different yet highly complementary perspectives on cell behavior it is expected that integration of these datasets will lead to conceptual advances in our understanding of the fundamental design and evolutionary principles that underlie the organization and function of proteins within biochemical pathways. Here we present Bacteriome.org, a resource that combines locally generated interaction and evolutionary datasets with a previously generated knowledgebase, to provide an integrated view of the Escherichia coli interactome. Tools are provided which allow the user to select and visualize functional, evolutionary and structural relationships between groups of interacting proteins and to focus on genes of interest. Currently the database contains three interaction datasets: a functional dataset consisting of 3989 interactions between 1927 proteins; a 'core' high quality experimental dataset of 4863 interactions between 1100 proteins and an 'extended' experimental dataset of 9860 interactions between 2131 proteins. Bacteriome.org is available online at http://www.bacteriome.org.

Comparative Proteomics Profiling of a Phospholamban Mutant Mouse Model of Dilated Cardiomyopathy Reveals Progressive Intracellular Stress Responses

Defective mobilization of Ca2+ by cardiomyocytes can lead to cardiac insufficiency, but the causative mechanisms leading to congestive heart failure (HF) remain unclear. In the present study we performed exhaustive global proteomics surveys of cardiac ventricle isolated from a mouse model of cardiomyopathy overexpressing a phospholamban mutant, R9C (PLN-R9C), and exhibiting impaired Ca2+ handling and death at 24 weeks and compared them with normal control littermates. The relative expression patterns of 6190 high confidence proteins were monitored by shotgun tandem mass spectrometry at 8, 16, and 24 weeks of disease progression. Significant differential abundance of 593 proteins was detected. These proteins mapped to select biological pathways such as endoplasmic reticulum stress response, cytoskeletal remodeling, and apoptosis and included known biomarkers of HF (e.g. brain natriuretic peptide/atrial natriuretic factor and angiotensin-converting enzyme) and other indicators of presymptomatic functional impairment. These altered proteomic profiles were concordant with cognate mRNA patterns recorded in parallel using high density mRNA microarrays, and top candidates were validated by RT-PCR and Western blotting. Mapping of our highest ranked proteins against a human diseased explant and to available data sets indicated that many of these proteins could serve as markers of disease. Indeed we showed that several of these proteins are detectable in mouse and human plasma and display differential abundance in the plasma of diseased mice and affected patients. These results offer a systems-wide perspective of the dynamic maladaptions associated with impaired Ca2+ homeostasis that perturb myocyte function and ultimately converge to cause HF.

Interactions of Elongation Factor EF-P with the Escherichia Coli Ribosome

EF-P (eubacterial elongation factor P) is a highly conserved protein essential for protein synthesis. We report that EF-P protects 16S rRNA near the G526 streptomycin and the S12 and mRNA binding sites (30S T-site). EF-P also protects domain V of the 23S rRNA proximal to the A-site (50S T-site) and more strongly the A-site of 70S ribosomes. We suggest that EF-P: (a) may play a role in translational fidelity and (b) prevents entry of fMet-tRNA into the A-site enabling it to bind to the 50S P-site. We also report that EF-P promotes a ribosome-dependent accommodation of fMet-tRNA into the 70S P-site.

Proteomic Methods for Drug Target Discovery

The field of drug target discovery is currently very popular with a great potential for advancing biomedical research and chemical genomics. Innovative strategies have been developed to aid the process of target identification, either by elucidating the primary mechanism-of-action of a drug, by understanding side effects involving unanticipated 'off-target' interactions, or by finding new potential therapeutic value for an established drug. Several promising proteomic methods have been introduced for directly isolating and identifying the protein targets of interest that are bound by active small molecules or for visualizing enzyme activities affected by drug treatment. Significant progress has been made in this rapidly advancing field, speeding the clinical validation of drug candidates and the discovery of the novel targets for lead compounds developed using cell-based phenotypic screens. Using these proteomic methods, further insight into drug activity and toxicity can be ascertained.

Evaluation of Data-dependent Versus Targeted Shotgun Proteomic Approaches for Monitoring Transcription Factor Expression in Breast Cancer

In breast cancer, there is a significant degree of molecular diversity among tumors. Multiple perturbations in signal transduction pathways impinge on transcriptional networks that in turn dictate malignant transformation and metastatic progression. Detailed knowledge of the sequence-specific transcription factors that become activated or repressed within a tumor and comparison of their relative levels of expression in cancer versus normal tissue should therefore provide insight into disease mechanisms, improving patient stratification and facilitating personalized treatment. While high-throughput tandem mass spectrometry methods for global proteome profiling have been developed, existing approaches have limited sensitivity and are often unable to detect low-abundance transcription factors in a complex biological specimen like a biopsy or tumor cell extract. To this end, we have undertaken a systematic comparative evaluation of three MS/MS methods for the ability to detect reference transcription factors spiked in known amounts into a cell-free breast cancer nuclear extract: Data-Dependent Acquisition (DDA), wherein precursor ion intensity dictates selection for fragmentation; Targeted Peptide Monitoring (TPM), a directed approach using successive isolation and fragmentation of predefined m/ z ratios; and Multiple Reaction Monitoring (MRM), in which specific precursor ion to product ion transitions are selectively monitored. Through a series of controlled, parallel benchmarking experiments, we have determined the relative figures-of-merit of each approach, and have established that prior knowledge of signature proteotypic peptides markedly improves overall detection sensitivity, reliability, and quantification.

Chaperone Control of the Activity and Specificity of the Histone H3 Acetyltransferase Rtt109

Acetylation of Saccharomyces cerevisiae histone H3 on K56 by the histone acetyltransferase (HAT) Rtt109 is important for repairing replication-associated lesions. Rtt109 purifies from yeast in complex with the histone chaperone Vps75, which stabilizes the HAT in vivo. A whole-genome screen to identify genes whose deletions have synthetic genetic interactions with rtt109Delta suggests Rtt109 has functions in addition to DNA repair. We show that in addition to its known H3-K56 acetylation activity, Rtt109 is also an H3-K9 HAT, and we show that Rtt109 and Gcn5 are the only H3-K9 HATs in vivo. Rtt109's H3-K9 acetylation activity in vitro is enhanced strongly by Vps75. Another histone chaperone, Asf1, and Vps75 are both required for acetylation of lysine 9 on H3 (H3-K9ac) in vivo by Rtt109, whereas H3-K56ac in vivo requires only Asf1. Asf1 also physically interacts with the nuclear Hat1/Hat2/Hif1 complex that acetylates H4-K5 and H4-K12. We suggest Asf1 is capable of assembling into chromatin H3-H4 dimers diacetylated on both H4-K5/12 and H3-K9/56.

The Extensive and Condition-dependent Nature of Epistasis Among Whole-genome Duplicates in Yeast

Since complete redundancy between extant duplicates (paralogs) is evolutionarily unfavorable, some degree of functional congruency is eventually lost. However, in budding yeast, experimental evidence collected for duplicated metabolic enzymes and in global physical interaction surveys had suggested widespread functional overlap between paralogs. While maintained functional overlap is thought to confer robustness against genetic mutation and facilitate environmental adaptability, it has yet to be determined what properties define paralogs that can compensate for the phenotypic consequence of deleting a sister gene, how extensive this epistasis is, and how adaptable it is toward alternate environmental states. To this end, we have performed a comprehensive experimental analysis of epistasis as indicated by aggravating genetic interactions between paralogs resulting from an ancient whole-genome duplication (WGD) event occurring in the budding yeast Saccharomyces cerevisiae, and thus were able to compare properties of large numbers of epistatic and non-epistatic paralogs with identical evolutionary times since divergence. We found that more than one-third (140) of the 399 examinable WGD paralog pairs were epistatic under standard laboratory conditions and that additional cases of epistasis became obvious only under media conditions designed to induce cellular stress. Despite a significant increase in within-species sequence co-conservation, analysis of protein interactions revealed that paralogs epistatic under standard laboratory conditions were not more functionally overlapping than those non-epistatic. As experimental conditions had an impact on the functional categorization of paralogs deemed epistatic and only a fraction of potential stress conditions have been interrogated here, we hypothesize that many epistatic relationships remain unresolved.

Interpretation of Large-scale Quantitative Shotgun Proteomic Profiles for Biomarker Discovery

Large-scale quantitative shotgun tandem mass spectrometry serves as a flexible proteomic platform for the systematic investigation of molecular processes perturbed by disease and the potential discovery of clinically relevant protein biomarkers associated with a particular pathology. Multiple innovative profiling techniques have been introduced with the aims of comprehensively identifying and quantifying the protein complements of human tissues and blood and enabling the systematic comparison of clinically relevant specimens. In this review, the novel computational methods that have been developed to maximize the amount of information inferred from the raw spectral datasets are explored, including innovative database search programs for more complete and confident protein identifications, improved normalization and data processing techniques to enhance statistical accuracy, and innovative algorithms for improved pattern discrimination and biomarker candidate ranking. Ultimately, integrative biomarker analyses that amalgamate the protein expression data generated by these approaches, together with data from high-throughput functional genomics, offer the possibility of identifying the mechanisms and causative factors that underlie complex human diseases, thereby improving both clinical outcomes and personalized treatment options.

ESGA: E. Coli Synthetic Genetic Array Analysis

Physical and functional interactions define the molecular organization of the cell. Genetic interactions, or epistasis, tend to occur between gene products involved in parallel pathways or interlinked biological processes. High-throughput experimental systems to examine genetic interactions on a genome-wide scale have been devised for Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans and Drosophila melanogaster, but have not been reported previously for prokaryotes. Here we describe the development of a quantitative screening procedure for monitoring bacterial genetic interactions based on conjugation of Escherichia coli deletion or hypomorphic strains to create double mutants on a genome-wide scale. The patterns of synthetic sickness and synthetic lethality (aggravating genetic interactions) we observed for certain double mutant combinations provided information about functional relationships and redundancy between pathways and enabled us to group bacterial gene products into functional modules.

Biosynthesis of the Respiratory Formate Dehydrogenases from Escherichia Coli: Characterization of the FdhE Protein

Escherichia coli can perform two modes of formate metabolism. Under respiratory conditions, two periplasmically-located formate dehydrogenase isoenzymes couple formate oxidation to the generation of a transmembrane electrochemical gradient; and under fermentative conditions a third cytoplasmic isoenzyme is involved in the disproportionation of formate to CO(2) and H(2). The respiratory formate dehydrogenases are redox enzymes that comprise three subunits: a molybdenum cofactor- and FeS cluster-containing catalytic subunit; an electron-transferring ferredoxin; and a membrane-integral cytochrome b. The catalytic subunit and its ferredoxin partner are targeted to the periplasm as a complex by the twin-arginine transport (Tat) pathway. Biosynthesis of these enzymes is under control of an accessory protein termed FdhE. In this study, it is shown that E. coli FdhE interacts with the catalytic subunits of the respiratory formate dehydrogenases. Purification of recombinant FdhE demonstrates the protein is an iron-binding rubredoxin that can adopt monomeric and homodimeric forms. Bacterial two-hybrid analysis suggests the homodimer form of FdhE is stabilized by anaerobiosis. Site-directed mutagenesis shows that conserved cysteine motifs are essential for the physiological activity of the FdhE protein and are also involved in iron ligation.

Sequential Interval Motif Search: Unrestricted Database Surveys of Global MS/MS Data Sets for Detection of Putative Post-translational Modifications

Tandem mass spectrometry is the prevailing approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. Effective database search engines have been developed to identify peptide sequences from MS/MS fragmentation spectra. Since proteins are polymorphic and subject to post-translational modifications (PTM), however, computational methods for detecting unanticipated variants are also needed to achieve true proteome-wide coverage. Different from existing "unrestrictive" search tools, we present a novel algorithm, termed SIMS (for Sequential Motif Interval Search), that interprets pairs of product ion peaks, representing potential amino acid residues or "intervals", as a means of mapping PTMs or substitutions in a blind database search mode. An effective heuristic software program was likewise developed to evaluate, rank, and filter optimal combinations of relevant intervals to identify candidate sequences, and any associated PTM or polymorphism, from large collections of MS/MS spectra. The prediction performance of SIMS was benchmarked extensively against annotated reference spectral data sets and compared favorably with, and was complementary to, current state-of-the-art methods. An exhaustive discovery screen using SIMS also revealed thousands of previously overlooked putative PTMs in a compendium of yeast protein complexes and in a proteome-wide map of adult mouse cardiomyocytes. We demonstrate that SIMS, freely accessible for academic research use, addresses gaps in current proteomic data interpretation pipelines, improving overall detection coverage, and facilitating comprehensive investigations of the fundamental multiplicity of the expressed proteome.

High-resolution Biomarker Discovery: Moving from Large-scale Proteome Profiling to Quantitative Validation of Lead Candidates

Diverse proteomic techniques based on protein MS have been introduced to systematically characterize protein perturbations associated with disease. Progress in clinical proteomics is essential for personalized medicine, wherein treatments will be tailored to individual needs based on patient stratification using noninvasive disease monitoring procedures to reveal the most appropriate therapeutic targets. However, breakthroughs await the successful development and application of a robust proteomic pipeline capable of identifying and rigorously assessing the relevance of multiple candidate proteins as informative diagnostic and prognostic indicators or suitable drug targets involved in a pathological process. While steady progress has been made toward more comprehensive proteome profiling, the emphasis must now shift from in depth screening of reference samples to stringent quantitative validation of selected lead candidates in a broader clinical context. Here, we present an overview of the emerging proteomic strategies for high-throughput protein detection focused primarily on targeted MS/MS as the basis for biomarker verification in large clinical cohorts. We discuss the conceptual promise and practical pitfalls of these methods in terms of achieving higher dynamic range, higher throughput, and more reliable quantification, highlighting research avenues that merit additional inquiry.

Computational and Experimental Approaches to Chart the Escherichia Coli Cell-envelope-associated Proteome and Interactome

The bacterial cell-envelope consists of a complex arrangement of lipids, proteins and carbohydrates that serves as the interface between a microorganism and its environment or, with pathogens, a human host. Escherichia coli has long been investigated as a leading model system to elucidate the fundamental mechanisms underlying microbial cell-envelope biology. This includes extensive descriptions of the molecular identities, biochemical activities and evolutionary trajectories of integral transmembrane proteins, many of which play critical roles in infectious disease and antibiotic resistance. Strikingly, however, only half of the c. 1200 putative cell-envelope-related proteins of E. coli currently have experimentally attributed functions, indicating an opportunity for discovery. In this review, we summarize the state of the art of computational and proteomic approaches for determining the components of the E. coli cell-envelope proteome, as well as exploring the physical and functional interactions that underlie its biogenesis and functionality. We also provide a comprehensive comparative benchmarking analysis on the performance of different bioinformatic and proteomic methods commonly used to determine the subcellular localization of bacterial proteins.

Interaction of the Deubiquitinating Enzyme Ubp2 and the E3 Ligase Rsp5 is Required for Transporter/receptor Sorting in the Multivesicular Body Pathway

Protein ubiquitination is essential for many events linked to intracellular protein trafficking. We sought to elucidate the possible involvement of the S. cerevisiae deubiquitinating enzyme Ubp2 in transporter and receptor trafficking after we (this study) and others established that affinity purified Ubp2 interacts stably with the E3 ubiquitin ligase Rsp5 and the (ubiquitin associated) UBA domain containing protein Rup1. UBP2 interacts genetically with RSP5, while Rup1 facilitates the tethering of Ubp2 to Rsp5 via a PPPSY motif. Using the uracil permease Fur4 as a model reporter system, we establish a role for Ubp2 in membrane protein turnover. Similar to hypomorphic rsp5 alleles, cells deleted for UBP2 exhibited a temporal stabilization of Fur4 at the plasma membrane, indicative of perturbed protein trafficking. This defect was ubiquitin dependent, as a Fur4 N-terminal ubiquitin fusion construct bypassed the block and restored sorting in the mutant. Moreover, the defect was absent in conditions where recycling was absent, implicating Ubp2 in sorting at the multivesicular body. Taken together, our data suggest a previously overlooked role for Ubp2 as a positive regulator of Rsp5-mediated membrane protein trafficking subsequent to endocytosis.

Conserved Network of Proteins Essential for Bacterial Viability

The yjeE, yeaZ, and ygjD genes are highly conserved in the genomes of eubacteria, and ygjD orthologs are also found throughout the Archaea and eukaryotes. In this study, we have constructed conditional expression strains for each of these genes in the model organism Escherichia coli K12. We show that each gene is essential for the viability of E. coli under laboratory growth conditions. Growth of the conditional strains under nonpermissive conditions results in dramatic changes in cell ultrastructure. Deliberate repression of the expression of yeaZ results in cells with highly condensed nucleoids, while repression of yjeE and ygjD expression results in at least a proportion of very enlarged cells with an unusual peripheral distribution of DNA. Each of the three conditional expression strains can be complemented by multicopy clones harboring the rstA gene, which encodes a two-component-system response regulator, strongly suggesting that these proteins are involved in the same essential cellular pathway. The results of bacterial two-hybrid experiments show that YeaZ can interact with both YjeE and YgjD but that YgjD is the preferred interaction partner. The results of in vitro experiments indicate that YeaZ mediates the proteolysis of YgjD, suggesting that YeaZ and YjeE act as regulators to control the activity of this protein. Our results are consistent with these proteins forming a link between DNA metabolism and cell division.

Global Quantitative Proteomic Profiling Through 18O-labeling in Combination with MS/MS Spectra Analysis

Several stable-isotope-based peptide labeling methods have been developed to support large-scale relative quantitation, through mass spectrometry, of proteins present in two different biological samples. In one of these, trypsin-catalyzed 18O-based labeling, quantitation is typically performed at the full scan (MS) level by comparing the peak intensities of sister precursor ions corresponding to the labeled and unlabeled forms of an intact peptide as they co-elute during liquid chromatography (LC) separations. We show here that measuring relative abundance at the product ion (MS/MS) level after fragmentation provides excellent accuracy, sensitivity and signal-to-noise, while combining quantitation with global shotgun protein identification. To facilitate routine data analysis using this approach, we have developed two specialized software programs, ySelect and yRatios, which draw upon database search results for 18O-based data sets and combine fragmentation spectra peak lists to (1) accurately determine protein ratios between two samples while applying a correction for incomplete labeling and (2) tabulate these results in both intuitive summary reports and in formats amenable to systematic pathway level analysis. To validate our process, we subjected simple and complex test protein mixtures to single-step and multistep LC-MS/MS profiling experiments. Ratio distributions approached the expected means, allowing empirical derivation of confidence level cutoffs for determining statistically significant fold-changes in protein abundance. A set of stringent criteria for detecting spurious ratios based on consistency checking between unlabeled and labeled y-ion pairs was found to highlight putative false positive identifications. In summary, this toolkit facilitates comparative proteomic quantitation under conditions that are optimized for making reliable protein inferences.

Global Functional Atlas of Escherichia Coli Encompassing Previously Uncharacterized Proteins

One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a "systems-wide" functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.

Systematic Characterization of the Protein Interaction Network and Protein Complexes in Saccharomyces Cerevisiae Using Tandem Affinity Purification and Mass Spectrometry

Defining protein complexes is a vital aspect of cell biology because cellular processes are often carried out by stable protein complexes and their characterization often provides insights into their function. Accurate identification of the interacting proteins in macromolecular complexes is easiest after purification to near homogeneity. To this end, the tandem affinity purification (TAP) system with subsequent protein identification by high-throughput mass spectrometry was developed (1, 2) to systematically characterize native protein complexes and transient protein interactions under near-physiological conditions. The TAP tag containing two adjacent affinity purification tags (calmodulin-binding peptide and Staphylococcus aureus protein A) separated by a tobacco etch virus (TEV) protease cleavage site is fused with the open reading frame of interest. Using homologous recombination, a fusion library was constructed for the yeast Saccharomyces cerevisiae (3) in which the carboxy-terminal end of each predicted open reading frame is individually tagged in the chromosome so that the resulting fusion proteins are expressed under the control of their natural promoters (3). In this chapter, an optimized protocol for systematic protein purification and subsequent mass spectrometry-based protein identification is described in detail for the protein complexes of S. cerevisiae (4-6).

An Atlas of Chaperone-protein Interactions in Saccharomyces Cerevisiae: Implications to Protein Folding Pathways in the Cell

Molecular chaperones are known to be involved in many cellular functions, however, a detailed and comprehensive overview of the interactions between chaperones and their cofactors and substrates is still absent. Systematic analysis of physical TAP-tag based protein-protein interactions of all known 63 chaperones in Saccharomyces cerevisiae has been carried out. These chaperones include seven small heat-shock proteins, three members of the AAA+ family, eight members of the CCT/TRiC complex, six members of the prefoldin/GimC complex, 22 Hsp40s, 1 Hsp60, 14 Hsp70s, and 2 Hsp90s. Our analysis provides a clear distinction between chaperones that are functionally promiscuous and chaperones that are functionally specific. We found that a given protein can interact with up to 25 different chaperones during its lifetime in the cell. The number of interacting chaperones was found to increase with the average number of hydrophobic stretches of length between one and five in a given protein. Importantly, cellular hot spots of chaperone interactions are elucidated. Our data suggest the presence of endogenous multicomponent chaperone modules in the cell.

Sequential Peptide Affinity Purification System for the Systematic Isolation and Identification of Protein Complexes from Escherichia Coli

Biochemical purification of affinity-tagged proteins in combination with mass spectrometry methods is increasingly seen as a cornerstone of systems biology, as it allows for the systematic genome-scale characterization of macromolecular protein complexes, representing demarcated sets of stably interacting protein partners. Accurate and sensitive identification of both the specific and shared polypeptide components of distinct complexes requires purification to near homogeneity. To this end, a sequential peptide affinity (SPA) purification system was developed to enable the rapid and efficient isolation of native Escherichia coli protein complexes (J Proteome Res 3:463-468, 2004). SPA purification makes use of a dual-affinity tag, consisting of three modified FLAG sequences (3X FLAG) and a calmodulin binding peptide (CBP), spaced by a cleavage site for tobacco etch virus (TEV) protease (J Proteome Res 3:463-468, 2004). Using the lambda-phage Red homologous recombination system (PNAS 97:5978-5983, 2000), a DNA cassette, encoding the SPA-tag and a selectable marker flanked by gene-specific targeting sequences, is introduced into a selected locus in the E. coli chromosome so as to create a C-terminal fusion with the protein of interest. This procedure aims for near-endogenous levels of tagged protein production in the recombinant bacteria to avoid spurious, non-specific protein associations (J Proteome Res 3:463-468, 2004). In this chapter, we describe a detailed, optimized protocol for the tagging, purification, and subsequent mass spectrometry-based identification of the subunits of even low-abundance bacterial protein complexes isolated as part of an ongoing large-scale proteomic study in E. coli (Nature 433:531-537, 2005).

Systems-level Approaches for Identifying and Analyzing Genetic Interaction Networks in Escherichia Coli and Extensions to Other Prokaryotes

Molecular interactions define the functional organization of the cell. Epistatic (genetic, or gene-gene) interactions, one of the most informative and commonly encountered forms of functional relationships, are increasingly being used to map process architecture in model eukaryotic organisms. In particular, 'systems-level' screens in yeast and worm aimed at elucidating genetic interaction networks have led to the generation of models describing the global modular organization of gene products and protein complexes within a cell. However, comparable data for prokaryotic organisms have not been available. Given its ease of growth and genetic manipulation, the Gram-negative bacterium Escherichia coli appears to be an ideal model system for performing comprehensive genome-scale examinations of genetic redundancy in bacteria. In this review, we highlight emerging experimental and computational techniques that have been developed recently to examine functional relationships and redundancy in E. coli at a systems-level, and their potential application to prokaryotes in general. Additionally, we have scanned PubMed abstracts and full-text published articles to manually curate a list of approximately 200 previously reported synthetic sick or lethal genetic interactions in E. coli derived from small-scale experimental studies.

A Systematic Characterization of Cwc21, the Yeast Ortholog of the Human Spliceosomal Protein SRm300

Cwc21 (complexed with Cef1 protein 21) is a 135 amino acid yeast protein that shares homology with the N-terminal domain of human SRm300/SRRM2, a large serine/arginine-repeat protein shown previously to associate with the splicing coactivator and 3'-end processing stimulatory factor, SRm160. Proteomic analysis of spliceosomal complexes has suggested a role for Cwc21 and SRm300 at the core of the spliceosome. However, specific functions for these proteins have remained elusive. In this report, we employ quantitative genetic interaction mapping, mass spectrometry of tandem affinity-purified complexes, and microarray profiling to investigate genetic, physical, and functional interactions involving Cwc21. Combined data from these assays support multiple roles for Cwc21 in the formation and function of splicing complexes. Consistent with a role for Cwc21 at the core of the spliceosome, we observe strong genetic, physical, and functional interactions with Isy1, a protein previously implicated in the first catalytic step of splicing and splicing fidelity. Together, the results suggest multiple functions for Cwc21/SRm300 in the splicing process, including an important role in the activation of splicing in association with Isy1.

An Acetylated Form of Histone H2A.Z Regulates Chromosome Architecture in Schizosaccharomyces Pombe

Histone variant H2A.Z has a conserved role in genome stability, although it remains unclear how this is mediated. Here we demonstrate that the fission yeast Swr1 ATPase inserts H2A.Z (Pht1) into chromatin and Kat5 acetyltransferase (Mst1) acetylates it. Deletion or an unacetylatable mutation of Pht1 leads to genome instability, primarily caused by chromosome entanglement and breakage at anaphase. This leads to the loss of telomere-proximal markers, though telomere protection and repeat length are unaffected by the absence of Pht1. Strikingly, the chromosome entanglement in pht1Delta anaphase cells can be rescued by forcing chromosome condensation before anaphase onset. We show that the condensin complex, required for the maintenance of anaphase chromosome condensation, prematurely dissociates from chromatin in the absence of Pht1. This and other findings suggest an important role for H2A.Z in the architecture of anaphase chromosomes.

Recent Advances and Method Development for Drug Target Identification

Although it is commonly recognized that most drugs cause inhibition or activation of function by physically binding to one or more gene products, the direct interactions of bioactive small molecules with specific gene products, or targets, is often not well characterized. From a therapeutic perspective, it is nevertheless essential to know a drug's binding partner(s) to understand the mechanism of action and anticipate possible side effects to avoid costly clinical failures. This knowledge is increasingly important as the prevalence of polypharmacy expands to include drugs that engage multiple targets. This review provides a succinct overview of several recent approaches that employ genetics, proteomics, expression profiling or bioinformatics procedures for the systematic characterization of the targets of bioactive compounds. The continuous improvement and advancement of existing technologies is critically discussed and we offer a perspective on the future of innovative emerging new generation technologies.

Predicting Protein Functions by Relaxation Labelling Protein Interaction Network

One of key issues in the post-genomic era is to assign functions to uncharacterized proteins. Since proteins seldom act alone; rather, they must interact with other biomolecular units to execute their functions. Thus, the functions of unknown proteins may be discovered through studying their interactions with proteins having known functions. Although many approaches have been developed for this purpose, one of main limitations in most of these methods is that the dependence among functional terms has not been taken into account.

Pathway Analysis of Dilated Cardiomyopathy Using Global Proteomic Profiling and Enrichment Maps

Global protein expression profiling can potentially uncover perturbations associated with common forms of heart disease. We have used shotgun MS/MS to monitor the state of biological systems in cardiac tissue correlating with disease onset, cardiac insufficiency and progression to heart failure in a time-course mouse model of dilated cardiomyopathy. However, interpreting the functional significance of the hundreds of differentially expressed proteins has been challenging. Here, we utilize improved enrichment statistical methods and an extensive collection of functionally related gene sets, gaining a more comprehensive understanding of the progressive alterations associated with functional decline in dilated cardiomyopathy. We visualize the enrichment results as an Enrichment Map, where significant gene sets are grouped based on annotation similarity. This approach vastly simplifies the interpretation of the large number of enriched gene sets found. For pathways of specific interest, such as Apoptosis and the MAPK (mitogen-activated protein kinase) cascade, we performed a more detailed analysis of the underlying signaling network, including experimental validation of expression patterns.

Single Amino Acid Resolution of Proteolytic Fragments Generated in Individual Cells

The complex nature of enzyme regulation mandates that enzyme activity profiles be measured in the context of the intact cell. Single-cell capillary electrophoresis (CE) coupled with laser-induced fluorescence is a powerful approach for quantitation and separation of analytes present in small samples and single live cells; however, it does not allow for the definitive identification of the reaction products. On the other hand, mass spectrometry (MS) is able to identify analytes but still lacks the requisite sensitivity for most single-cell analysis applications. Thus, it follows that by determining the relative amounts of reaction products generated in single cells using CE and by producing larger quantities of these products using bulk cell populations to identify them using MS, it is possible to determine enzyme activity profiles in single cells. In this study, the applicability of this approach was demonstrated by examining the intracellular fate of a protease substrate derived from the beta-amyloid precursor protein (beta-APP). In single live TF-1 cells, three distinct fragments were generated from the beta-APP peptide, which differed by a single uncharged amino acid. The CE measurements indicated that the proteolytic fragment profiles (i.e., the relative amounts of each fragment) were consistent from cell to cell but that they were different from those obtained in cell lysates. Furthermore, measurements obtained at the single cell level made it possible to observe a modest but statistically significant negative correlation between the total amount of beta-APP peptide loaded in cells and the fraction of peptide that remained intact. This study demonstrates how single-cell CE, MS, and peptide substrates can be combined to identify and measure enzyme activities in single live cells.

A Lentiviral Functional Proteomics Approach Identifies Chromatin Remodeling Complexes Important for the Induction of Pluripotency

Protein complexes and protein-protein interactions are essential for almost all cellular processes. Here, we establish a mammalian affinity purification and lentiviral expression (MAPLE) system for characterizing the subunit compositions of protein complexes. The system is flexible (i.e. multiple N- and C-terminal tags and multiple promoters), is compatible with Gateway cloning, and incorporates a reference peptide. Its major advantage is that it permits efficient and stable delivery of affinity-tagged open reading frames into most mammalian cell types. We benchmarked MAPLE with a number of human protein complexes involved in transcription, including the RNA polymerase II-associated factor, negative elongation factor, positive transcription elongation factor b, SWI/SNF, and mixed lineage leukemia complexes. In addition, MAPLE was used to identify an interaction between the reprogramming factor Klf4 and the Swi/Snf chromatin remodeling complex in mouse embryonic stem cells. We show that the SWI/SNF catalytic subunit Smarca2/Brm is up-regulated during the process of induced pluripotency and demonstrate a role for the catalytic subunits of the SWI/SNF complex during somatic cell reprogramming. Our data suggest that the transcription factor Klf4 facilitates chromatin remodeling during reprogramming.

The Evolutionary Landscape of the Chromatin Modification Machinery Reveals Lineage Specific Gains, Expansions, and Losses

Model organisms such as yeast, fly, and worm have played a defining role in the study of many biological systems. A significant challenge remains in translating this information to humans. Of critical importance is the ability to differentiate those components where knowledge of function and interactions may be reliably inferred from those that represent lineage-specific innovations. To address this challenge, we use chromatin modification (CM) as a model system for exploring the evolutionary properties of their components in the context of their known functions and interactions. Collating previously identified components of CM from yeast, worm, fly, and human, we identified a "core" set of 50 CM genes displaying consistent orthologous relationships that likely retain their interactions and functions across taxa. In addition, we catalog many components that demonstrate lineage specific expansions and losses, highlighting much duplication within vertebrates that may reflect an expanded repertoire of regulatory mechanisms. Placed in the context of a high-quality protein-protein interaction network, we find, contrary to existing views of evolutionary modularity, that CM complex components display a mosaic of evolutionary histories: a core set of highly conserved genes, together with sets displaying lineage specific innovations. Although focused on CM, this study provides a template for differentiating those genes which are likely to retain their functions and interactions across species. As such, in addition to informing on the evolution of CM as a system, this study provides a set of comparative genomic approaches that can be generally applied to any biological systems.

Synthetic Peptide Arrays for Pathway-level Protein Monitoring by Liquid Chromatography-tandem Mass Spectrometry

Effective methods to detect and quantify functionally linked regulatory proteins in complex biological samples are essential for investigating mammalian signaling pathways. Traditional immunoassays depend on proprietary reagents that are difficult to generate and multiplex, whereas global proteomic profiling can be tedious and can miss low abundance proteins. Here, we report a target-driven liquid chromatography-tandem mass spectrometry (LC-MS/MS) strategy for selectively examining the levels of multiple low abundance components of signaling pathways which are refractory to standard shotgun screening procedures and hence appear limited in current MS/MS repositories. Our stepwise approach consists of: (i) synthesizing microscale peptide arrays, including heavy isotope-labeled internal standards, for use as high quality references to (ii) build empirically validated high density LC-MS/MS detection assays with a retention time scheduling system that can be used to (iii) identify and quantify endogenous low abundance protein targets in complex biological mixtures with high accuracy by correlation to a spectral database using new software tools. The method offers a flexible, rapid, and cost-effective means for routine proteomic exploration of biological systems including "label-free" quantification, while minimizing spurious interferences. As proof-of-concept, we have examined the abundance of transcription factors and protein kinases mediating pluripotency and self-renewal in embryonic stem cell populations.

Quantifying E. Coli Proteome and Transcriptome with Single-molecule Sensitivity in Single Cells

Protein and messenger RNA (mRNA) copy numbers vary from cell to cell in isogenic bacterial populations. However, these molecules often exist in low copy numbers and are difficult to detect in single cells. We carried out quantitative system-wide analyses of protein and mRNA expression in individual cells with single-molecule sensitivity using a newly constructed yellow fluorescent protein fusion library for Escherichia coli. We found that almost all protein number distributions can be described by the gamma distribution with two fitting parameters which, at low expression levels, have clear physical interpretations as the transcription rate and protein burst size. At high expression levels, the distributions are dominated by extrinsic noise. We found that a single cell's protein and mRNA copy numbers for any given gene are uncorrelated.

Constitutively Active Calcineurin Induces Cardiac Endoplasmic Reticulum Stress and Protects Against Apoptosis That is Mediated by Alpha-crystallin-B

Cardiac-specific overexpression of a constitutively active form of calcineurin A (CNA) leads directly to cardiac hypertrophy in the CNA mouse model. Because cardiac hypertrophy is a prominent characteristic of many cardiomyopathies, we deduced that delineating the proteomic profile of ventricular tissue from this model might identify novel, widely applicable therapeutic targets. Proteomic analysis was carried out by subjecting fractionated cardiac samples from CNA mice and their WT littermates to gel-free liquid chromatography linked to shotgun tandem mass spectrometry. We identified 1,918 proteins with high confidence, of which 290 were differentially expressed. Microarray analysis of the same tissue provided us with alterations in the ventricular transcriptome. Because bioinformatic analyses of both the proteome and transcriptome demonstrated the up-regulation of endoplasmic reticulum stress, we validated its occurrence in adult CNA hearts through a series of immunoblots and RT-PCR analyses. Endoplasmic reticulum stress often leads to increased apoptosis, but apoptosis was minimal in CNA hearts, suggesting that activated calcineurin might protect against apoptosis. Indeed, the viability of cultured neonatal mouse cardiomyocytes (NCMs) from CNA mice was higher than WT after serum starvation, an apoptotic trigger. Proteomic data identified α-crystallin B (Cryab) as a potential mediator of this protective effect and we showed that silencing of Cryab via lentivector-mediated transduction of shRNAs in NCMs led to a significant reduction in NCM viability and loss of protection against apoptosis. The identification of Cryab as a downstream effector of calcineurin-induced protection against apoptosis will permit elucidation of its role in cardiac apoptosis and its potential as a therapeutic target.

Structure of a SLC26 Anion Transporter STAS Domain in Complex with Acyl Carrier Protein: Implications for E. Coli YchM in Fatty Acid Metabolism

Escherichia coli YchM is a member of the SLC26 (SulP) family of anion transporters with an N-terminal membrane domain and a C-terminal cytoplasmic STAS domain. Mutations in human members of the SLC26 family, including their STAS domain, are linked to a number of inherited diseases. Herein, we describe the high-resolution crystal structure of the STAS domain from E. coli YchM isolated in complex with acyl-carrier protein (ACP), an essential component of the fatty acid biosynthesis (FAB) pathway. A genome-wide genetic interaction screen showed that a ychM null mutation is synthetically lethal with mutant alleles of genes (fabBDHGAI) involved in FAB. Endogenous YchM also copurified with proteins involved in fatty acid metabolism. Furthermore, a deletion strain lacking ychM showed altered cellular bicarbonate incorporation in the presence of NaCl and impaired growth at alkaline pH. Thus, identification of the STAS-ACP complex suggests that YchM sequesters ACP to the bacterial membrane linking bicarbonate transport with fatty acid metabolism.

Enrichment Map: a Network-based Method for Gene-set Enrichment Visualization and Interpretation

Gene-set enrichment analysis is a useful technique to help functionally characterize large gene lists, such as the results of gene expression experiments. This technique finds functionally coherent gene-sets, such as pathways, that are statistically over-represented in a given gene list. Ideally, the number of resulting sets is smaller than the number of genes in the list, thus simplifying interpretation. However, the increasing number and redundancy of gene-sets used by many current enrichment analysis software works against this ideal.

Expanding the Landscape of Chromatin Modification (CM)-related Functional Domains and Genes in Human

Chromatin modification (CM) plays a key role in regulating transcription, DNA replication, repair and recombination. However, our knowledge of these processes in humans remains very limited. Here we use computational approaches to study proteins and functional domains involved in CM in humans. We analyze the abundance and the pair-wise domain-domain co-occurrences of 25 well-documented CM domains in 5 model organisms: yeast, worm, fly, mouse and human. Results show that domains involved in histone methylation, DNA methylation, and histone variants are remarkably expanded in metazoan, reflecting the increased demand for cell type-specific gene regulation. We find that CM domains tend to co-occur with a limited number of partner domains and are hence not promiscuous. This property is exploited to identify 47 potentially novel CM domains, including 24 DNA-binding domains, whose role in CM has received little attention so far. Lastly, we use a consensus Machine Learning approach to predict 379 novel CM genes (coding for 329 proteins) in humans based on domain compositions. Several of these predictions are supported by very recent experimental studies and others are slated for experimental verification. Identification of novel CM genes and domains in humans will aid our understanding of fundamental epigenetic processes that are important for stem cell differentiation and cancer biology. Information on all the candidate CM domains and genes reported here is publicly available.

DAnCER: Disease-annotated Chromatin Epigenetics Resource

Chromatin modification (CM) is a set of epigenetic processes that govern many aspects of DNA replication, transcription and repair. CM is carried out by groups of physically interacting proteins, and their disruption has been linked to a number of complex human diseases. CM remains largely unexplored, however, especially in higher eukaryotes such as human. Here we present the DAnCER resource, which integrates information on genes with CM function from five model organisms, including human. Currently integrated are gene functional annotations, Pfam domain architecture, protein interaction networks and associated human diseases. Additional supporting evidence includes orthology relationships across organisms, membership in protein complexes, and information on protein 3D structure. These data are available for 962 experimentally confirmed and manually curated CM genes and for over 5000 genes with predicted CM function on the basis of orthology and domain composition. DAnCER allows visual explorations of the integrated data and flexible query capabilities using a variety of data filters. In particular, disease information and functional annotations are mapped onto the protein interaction networks, enabling the user to formulate new hypotheses on the function and disease associations of a given gene based on those of its interaction partners. DAnCER is freely available at http://wodaklab.org/dancer/.

A Dual Function of the CRISPR-Cas System in Bacterial Antivirus Immunity and DNA Repair

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and the associated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes. Cas1 is a CRISPR-associated protein that is common to all CRISPR-containing prokaryotes but its function remains obscure. Here we show that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions, replication forks and 5'-flaps. The crystal structure of YgbT and site-directed mutagenesis have revealed the potential active site. Genome-wide screens show that YgbT physically and genetically interacts with key components of DNA repair systems, including recB, recC and ruvB. Consistent with these findings, the ygbT deletion strain showed increased sensitivity to DNA damage and impaired chromosomal segregation. Similar phenotypes were observed in strains with deletion of CRISPR clusters, suggesting that the function of YgbT in repair involves interaction with the CRISPRs. These results show that YgbT belongs to a novel, structurally distinct family of nucleases acting on branched DNAs and suggest that, in addition to antiviral immunity, at least some components of the CRISPR-Cas system have a function in DNA repair.

Computational Refinement of Post-translational Modifications Predicted from Tandem Mass Spectrometry

A post-translational modification (PTM) is a chemical modification of a protein that occurs naturally. Many of these modifications, such as phosphorylation, are known to play pivotal roles in the regulation of protein function. Henceforth, PTM perturbations have been linked to diverse diseases like Parkinson's, Alzheimer's, diabetes and cancer. To discover PTMs on a genome-wide scale, there is a recent surge of interest in analyzing tandem mass spectrometry data, and several unrestrictive (so-called 'blind') PTM search methods have been reported. However, these approaches are subject to noise in mass measurements and in the predicted modification site (amino acid position) within peptides, which can result in false PTM assignments.

Ribosome-dependent ATPase Interacts with Conserved Membrane Protein in Escherichia Coli to Modulate Protein Synthesis and Oxidative Phosphorylation

Elongation factor RbbA is required for ATP-dependent deacyl-tRNA release presumably after each peptide bond formation; however, there is no information about the cellular role. Proteomic analysis in Escherichia coli revealed that RbbA reciprocally co-purified with a conserved inner membrane protein of unknown function, YhjD. Both proteins are also physically associated with the 30S ribosome and with members of the lipopolysaccharide transport machinery. Genome-wide genetic screens of rbbA and yhjD deletion mutants revealed aggravating genetic interactions with mutants deficient in the electron transport chain. Cells lacking both rbbA and yhjD exhibited reduced cell division, respiration and global protein synthesis as well as increased sensitivity to antibiotics targeting the ETC and the accuracy of protein synthesis. Our results suggest that RbbA appears to function together with YhjD as part of a regulatory network that impacts bacterial oxidative phosphorylation and translation efficiency.

Perinuclear Cohibin Complexes Maintain Replicative Life Span Via Roles at Distinct Silent Chromatin Domains

Heterochromatin, or silent chromatin, preferentially resides at the nuclear envelope. Telomeres and rDNA repeats are the two major perinuclear silent chromatin domains of Saccharomyces cerevisiae. The Cohibin protein complex maintains rDNA repeat stability in part through silent chromatin assembly and perinuclear rDNA anchoring. We report here a role for Cohibin at telomeres and show that functions of the complex at chromosome ends and rDNA maintain replicative life span. Cohibin binds LEM/SUN domain-containing nuclear envelope proteins and telomere-associated factors. Disruption of Cohibin or the envelope proteins abrogates telomere localization and silent chromatin assembly within subtelomeres. Loss of Cohibin limits Sir2 histone deacetylase localization to chromosome ends, disrupts subtelomeric DNA stability, and shortens life span even when rDNA repeats are stabilized. Restoring telomeric Sir2 concentration abolishes chromatin and life span defects linked to the loss of telomeric Cohibin. Our work uncovers roles for Cohibin complexes and reveals relationships between nuclear compartmentalization, chromosome stability, and aging.

Array-based Synthetic Genetic Screens to Map Bacterial Pathways and Functional Networks in Escherichia Coli

Cellular processes are carried out through a series of molecular interactions. Various experimental approaches can be used to investigate these functional relationships on a large-scale. Recently, the power of investigating biological systems from the perspective of genetic (gene-gene or epistatic) interactions has been evidenced by the ability to elucidate novel functional relationships. Examples of functionally related genes include genes that buffer each other's function or impinge on the same biological process. Genetic interactions have traditionally been investigated in bacteria by combining pairs of mutations (e.g., gene deletions) and assessing deviation of the phenotype of each double mutant from an expected neutral (or no interaction) phenotype. Fitness is a particularly convenient phenotype to measure: when the double mutant grows faster or slower than expected, the two mutated genes are said to show alleviating or aggravating interactions, respectively. The most commonly used neutral model assumes that the fitness of the double mutant is equal to the product of individual single mutant fitness. A striking genetic interaction is exemplified by the loss of two nonessential genes that buffer each other in performing an essential biological function: deleting only one of these genes produces no detectable fitness defect; however, loss of both genes simultaneously results in systems failure, leading to synthetic sickness or lethality. Systematic large-scale genetic interaction screens have been used to generate functional maps for model eukaryotic organisms, such as yeast, to describe the functional organization of gene products into pathways and protein complexes within a cell. They also reveal the modular arrangement and cross talk of pathways and complexes within broader functional neighborhoods (Dixon et al., Annu Rev Genet 43:601-625, 2009). Here, we present a high-throughput quantitative Escherichia coli Synthetic Genetic Array (eSGA) screening procedure, which we developed to systematically infer genetic interactions by scoring growth defects among large numbers of double mutants in a classic Gram-negative bacterium. The eSGA method exploits the rapid colony growth, ease of genetic manipulation, and natural efficient genetic exchange via conjugation of laboratory E. coli strains. Replica pinning is used to grow and mate arrayed sets of single gene mutant strains and to select double mutants en masse. Strain fitness, which is used as the eSGA readout, is quantified by the digital imaging of the plates and subsequent measuring and comparing single and double mutant colony sizes. While eSGA can be used to screen select mutants to probe the functions of individual genes, using eSGA more broadly to collect genetic interaction data for many combinations of genes can help reconstruct a functional interaction network to reveal novel links and components of biological pathways as well as unexpected connections between pathways. A variety of bacterial systems can be investigated, wherein the genes impinge on a essential biological process (e.g., cell wall assembly, ribosome biogenesis, chromosome replication) that are of interest from the perspective of drug development (Babu et al., Mol Biosyst 12:1439-1455, 2009). We also show how genetic interactions generated by high-throughput eSGA screens can be validated by manual small-scale genetic crosses and by genetic complementation and gene rescue experiments.

Identification of Mammalian Protein Complexes by Lentiviral-based Affinity Purification and Mass Spectrometry

Protein complexes and protein-protein interactions (PPIs) are fundamental for most biological functions. Deciphering the extensive protein interaction networks that occur within cellular contexts has become a logical extension to the human genome project. Proteome-scale interactome analysis of mammalian systems requires efficient methods for accurately detecting PPIs with specific considerations for the intrinsic technical challenges of mammalian genome manipulation. In this chapter, we outline in detail an innovative lentiviral-based functional proteomic approach that can be used to rapidly characterize protein complexes from a broad range of mammalian cell lines. This method integrates the following key features: (1) lentiviral elements for efficient delivery of tagged constructs into mammalian cell lines; (2) site-specific Gatewayâ„¢ recombination sites for easy cloning; (3) versatile epitope-tagging system for flexible affinity purification strategies; and (4) LC-MS-based protein identification using tandem mass spectrometry.

Array-based Synthetic Genetic Screens to Map Bacterial Pathways and Functional Networks in Escherichia Coli

Cellular processes are carried out through a series of molecular interactions. Various experimental approaches can be used to investigate these functional relationships on a large-scale. Recently, the power of investigating biological systems from the perspective of genetic (gene-gene, or epistatic) interactions has been evidenced by the ability to elucidate novel functional relationships. Examples of functionally related genes include genes that buffer each other's function or impinge on the same biological process. Genetic interactions have traditionally been investigated in bacteria by combining pairs of mutations (for example, gene deletions) and assessing deviation of the phenotype of each double mutant from an expected neutral (or no interaction) phenotype. Fitness is a particularly convenient phenotype to measure: when the double mutant grows faster or slower than expected, the two mutated genes are said to show alleviating or aggravating interactions, respectively. The most commonly used neutral model assumes that the fitness of the double mutant is equal to the product of individual single mutant fitness. A striking genetic interaction is exemplified by the loss of two nonessential genes that buffer each other in performing an essential biological function: deleting only one of these genes produces no detectable fitness defect; however, loss of both genes simultaneously results in systems failure, leading to synthetic sickness or lethality. Systematic large-scale genetic interaction screens have been used to generate functional maps for model eukaryotic organisms, such as yeast, to describe the functional organization of gene products into pathways and protein complexes within a cell. They also reveal the modular arrangement and cross-talk of pathways and complexes within broader functional neighborhoods (Dixon et al. Annu Rev Genet 43:601-625, 2009). Here, we present a high-throughput quantitative Escherichia coli synthetic genetic array (eSGA) screening procedure, which we developed to systematically infer genetic interactions by scoring growth defects among large numbers of double mutants in a classic gram-negative bacterium. The eSGA method exploits the rapid colony growth, ease of genetic manipulation, and natural efficient genetic exchange via conjugation of laboratory E. coli strains. Replica pinning is used to grow and mate arrayed sets of single-gene mutant strains as well as to select double mutants en mass. Strain fitness, which is used as the eSGA readout, is quantified by the digital imaging of the plates and subsequent measuring and comparing single and double mutant colony sizes. While eSGA can be used to screen select mutants to probe the functions of individual genes; using eSGA more broadly to collect genetic interaction data for many combinations of genes can help reconstruct a functional interaction network to reveal novel links and components of biological pathways as well as unexpected connections between pathways. A variety of bacterial systems can be investigated, wherein the genes impinge on a essential biological process (e.g., cell wall assembly, ribosome biogenesis, chromosome replication) that are of interest from the perspective of drug development (Babu et al. Mol Biosyst 12:1439-1455, 2009). We also show how genetic interactions generated by high-throughput eSGA screens can be validated by manual small-scale genetic crosses and by genetic complementation and gene rescue experiments.

Filtering and Interpreting Large-scale Experimental Protein-protein Interaction Data

Rarely acting in isolation, it is invariably the physical associations among proteins that define their biological activity, necessitating the study of the cellular meshwork of protein-protein interactions (PPI) before a full appreciation of gene function can be achieved. The past few years have seen a marked expansion in the both the sheer volume and number of organisms for which high-quality interaction data is available, with high-throughput interaction screening and detection techniques showing consistent improvement both in scale and sensitivity. Although techniques for large-scale PPI mapping are increasingly being applied to new organisms, including human, there is a corresponding need to rigorously evaluate, benchmark, and impartially filter the results. This chapter explores methods for PPI dataset evaluation, including a survey of previous techniques applied by landmark studies in the field and a discussion of promising new experimental approaches. We further outline practical suggestions and useful tools for interpreting newly generated PPI data. As the majority of large-scale experimental data has been generated for the budding yeast S. cerevisiae, most of the techniques and datasets described are from the perspective of this model unicellular eukaryote; however, extensions to other organisms including mammals are mentioned where possible.

Genetic Interaction Maps in Escherichia Coli Reveal Functional Crosstalk Among Cell Envelope Biogenesis Pathways

As the interface between a microbe and its environment, the bacterial cell envelope has broad biological and clinical significance. While numerous biosynthesis genes and pathways have been identified and studied in isolation, how these intersect functionally to ensure envelope integrity during adaptive responses to environmental challenge remains unclear. To this end, we performed high-density synthetic genetic screens to generate quantitative functional association maps encompassing virtually the entire cell envelope biosynthetic machinery of Escherichia coli under both auxotrophic (rich medium) and prototrophic (minimal medium) culture conditions. The differential patterns of genetic interactions detected among > 235,000 digenic mutant combinations tested reveal unexpected condition-specific functional crosstalk and genetic backup mechanisms that ensure stress-resistant envelope assembly and maintenance. These networks also provide insights into the global systems connectivity and dynamic functional reorganization of a universal bacterial structure that is both broadly conserved among eubacteria (including pathogens) and an important target.

Control of the RNA Polymerase II Phosphorylation State in Promoter Regions by CTD Interaction Domain-containing Proteins RPRD1A and RPRD1B

RNA polymerase II (RNAP II) C-terminal domain (CTD) phosphorylation is important for various transcription-related processes. Here, we identify by affinity purification and mass spectrometry three previously uncharacterized human CTD-interaction domain (CID)-containing proteins, RPRD1A, RPRD1B and RPRD2, which co-purify with RNAP II and three other RNAP II-associated proteins, RPAP2, GRINL1A and RECQL5, but not with the Mediator complex. RPRD1A and RPRD1B can accompany RNAP II from promoter regions to 3'-untranslated regions during transcription in vivo, predominantly interact with phosphorylated RNAP II, and can reduce CTD S5- and S7-phosphorylated RNAP II at target gene promoters. Thus, the RPRD proteins are likely to have multiple important roles in transcription.

Nanoparticle Size and Surface Chemistry Determine Serum Protein Adsorption and Macrophage Uptake

Delivery and toxicity are critical issues facing nanomedicine research. Currently, there is limited understanding and connection between the physicochemical properties of a nanomaterial and its interactions with a physiological system. As a result, it remains unclear how to optimally synthesize and chemically modify nanomaterials for in vivo applications. It has been suggested that the physicochemical properties of a nanomaterial after synthesis, known as its "synthetic identity", are not what a cell encounters in vivo. Adsorption of blood components and interactions with phagocytes can modify the size, aggregation state, and interfacial composition of a nanomaterial, giving it a distinct "biological identity". Here, we investigate the role of size and surface chemistry in mediating serum protein adsorption to gold nanoparticles and their subsequent uptake by macrophages. Using label-free liquid chromatography tandem mass spectrometry, we find that over 70 different serum proteins are heterogeneously adsorbed to the surface of gold nanoparticles. The relative density of each of these adsorbed proteins depends on nanoparticle size and poly(ethylene glycol) grafting density. Variations in serum protein adsorption correlate with differences in the mechanism and efficiency of nanoparticle uptake by a macrophage cell line. Macrophages contribute to the poor efficiency of nanomaterial delivery into diseased tissues, redistribution of nanomaterials within the body, and potential toxicity. This study establishes principles for the rational design of clinically useful nanomaterials.

Identification of Enzyme-converted Peptide Products from Single Cells Using Capillary Electrophoresis and Liquid Chromatography-mass Spectrometry

Single-cell analysis using chemical methods, otherwise known as chemical cytometry, promises to provide significant leaps in understanding signaling processes which result in cellular behavior. Sensitive methods for chemical cytometry such as capillary electrophoresis can detect and quantify multiple targets; however, conclusive identification of detected analytes is required for useful data to be obtained. Here, we demonstrate a method for determining the identity of enzyme-converted peptide products from single cells using a combination of capillary electrophoresis and liquid chromatography-mass spectrometry (LC-MS).

Target Identification by Chromatographic Co-elution: Monitoring of Drug-protein Interactions Without Immobilization or Chemical Derivatization

Bioactive molecules typically mediate their biological effects through direct physical association with one or more cellular proteins. The detection of drug-target interactions is therefore essential for the characterization of compound mechanism of action and off-target effects, but generic label-free approaches for detecting binding events in biological mixtures have remained elusive. Here, we report a method termed target identification by chromatographic co-elution (TICC) for routinely monitoring the interaction of drugs with cellular proteins under nearly physiological conditions in vitro based on simple liquid chromatographic separations of cell-free lysates. Correlative proteomic analysis of drug-bound protein fractions by shotgun sequencing is then performed to identify candidate target(s). The method is highly reproducible, does not require immobilization or derivatization of drug or protein, and is applicable to diverse natural products and synthetic compounds. The capability of TICC to detect known drug-protein target physical interactions (K(d) range: micromolar to nanomolar) is demonstrated both qualitatively and quantitatively. We subsequently used TICC to uncover the sterol biosynthetic enzyme Erg6p as a novel putative anti-fungal target. Furthermore, TICC identified Asc1 and Dak1, a core 40 S ribosomal protein that represses gene expression, and dihydroxyacetone kinase involved in stress adaptation, respectively, as novel yeast targets of a dopamine receptor agonist.

Characterization and Evolutionary Analysis of Protein-protein Interaction Networks

While researchers have known the importance of the protein-protein interaction for decades, recent innovations in large-scale screening techniques have caused a shift in the paradigm of protein function analysis. Where the focus was once on the individual protein, attention is now directed to the surrounding network of protein associations. As protein interaction networks can provide useful insights into the potential function of and phenotypes associated with proteins, the increasing availability of large-scale protein interaction data suggests that molecular biologists can extract more meaningful hypotheses through examination of these large networks. Further, increasing availability of high-quality protein interaction data in multiple species has allowed interpretation of the properties of networks (i.e., the presence of hubs and modularity) from an evolutionary perspective. In this chapter, we discuss major previous findings derived from analyses of large-scale protein interaction data, focusing on approaches taken by landmark assays in evaluating the structure and evolution of these networks. We then outline basic techniques for protein interaction network analysis with the goal of pointing out the benefits and potential limitations of these approaches. As the majority of large-scale protein interaction data has been generated in budding yeast, literature described here focuses on this important model organism with references to other species included where possible.

Genome-scale Genetic Manipulation Methods for Exploring Bacterial Molecular Biology

Bacteria are diverse and abundant, playing key roles in human health and disease, the environment, and biotechnology. Despite progress in genome sequencing and bioengineering, much remains unknown about the functional organization of prokaryotes. For instance, roughly a third of the protein-coding genes of the best-studied model bacterium, Escherichia coli, currently lack experimental annotations. Systems-level experimental approaches for investigating the functional associations of bacterial genes and genetic structures are essential for defining the fundamental molecular biology of microbes, preventing the spread of antibacterial resistance in the clinic, and driving the development of future biotechnological applications. This review highlights recently introduced large-scale genetic manipulation and screening procedures for the systematic exploration of bacterial gene functions, molecular relationships, and the global organization of bacteria at the gene, pathway, and genome levels.

Amino Acid Starvation Induced by Invasive Bacterial Pathogens Triggers an Innate Host Defense Program

Autophagy, which targets cellular constituents for degradation, is normally inhibited in metabolically replete cells by the metabolic checkpoint kinase mTOR. Although autophagic degradation of invasive bacteria has emerged as a critical host defense mechanism, the signals that induce autophagy upon bacterial infection remain unclear. We find that infection of epithelial cells with Shigella and Salmonella triggers acute intracellular amino acid (AA) starvation due to host membrane damage. Pathogen-induced AA starvation caused downregulation of mTOR activity, resulting in the induction of autophagy. In Salmonella-infected cells, membrane integrity and cytosolic AA levels rapidly normalized, favoring mTOR reactivation at the surface of the Salmonella-containing vacuole and bacterial escape from autophagy. In addition, bacteria-induced AA starvation activated the GCN2 kinase, eukaryotic initiation factor 2α, and the transcription factor ATF3-dependent integrated stress response and transcriptional reprogramming. Thus, AA starvation induced by bacterial pathogens is sensed by the host to trigger protective innate immune and stress responses.

Hsp110 is Required for Spindle Length Control

Systematic affinity purification combined with mass spectrometry analysis of N- and C-tagged cytoplasmic Hsp70/Hsp110 chaperones was used to identify new roles of Hsp70/Hsp110 in the cell. This allowed the mapping of a chaperone-protein network consisting of 1,227 unique interactions between the 9 chaperones and 473 proteins and highlighted roles for Hsp70/Hsp110 in 14 broad biological processes. Using this information, we uncovered an essential role for Hsp110 in spindle assembly and, more specifically, in modulating the activity of the widely conserved kinesin-5 motor Cin8. The role of Hsp110 Sse1 as a nucleotide exchange factor for the Hsp70 chaperones Ssa1/Ssa2 was found to be required for maintaining the proper distribution of kinesin-5 motors within the spindle, which was subsequently required for bipolar spindle assembly in S phase. These data suggest a model whereby the Hsp70-Hsp110 chaperone complex antagonizes Cin8 plus-end motility and prevents premature spindle elongation in S phase.

A Census of Human Soluble Protein Complexes

Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions that were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes and encompass both candidate disease genes and unannotated proteins to inform on mechanism. Strikingly, whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.

Interaction Landscape of Membrane-protein Complexes in Saccharomyces Cerevisiae

Macromolecular assemblies involving membrane proteins (MPs) serve vital biological roles and are prime drug targets in a variety of diseases. Large-scale affinity purification studies of soluble-protein complexes have been accomplished for diverse model organisms, but no global characterization of MP-complex membership has been described so far. Here we report a complete survey of 1,590 putative integral, peripheral and lipid-anchored MPs from Saccharomyces cerevisiae, which were affinity purified in the presence of non-denaturing detergents. The identities of the co-purifying proteins were determined by tandem mass spectrometry and subsequently used to derive a high-confidence physical interaction map encompassing 1,726 membrane protein-protein interactions and 501 putative heteromeric complexes associated with the various cellular membrane systems. Our analysis reveals unexpected physical associations underlying the membrane biology of eukaryotes and delineates the global topological landscape of the membrane interactome.

ComplexQuant: High-throughput Computational Pipeline for the Global Quantitative Analysis of Endogenous Soluble Protein Complexes Using High Resolution Protein HPLC and Precision Label-free LC/MS/MS

The experimental isolation and characterization of stable multi-protein complexes are essential to understanding the molecular systems biology of a cell. To this end, we have developed a high-throughput proteomic platform for the systematic identification of native protein complexes based on extensive fractionation of soluble protein extracts by multi-bed ion exchange high performance liquid chromatography (IEX-HPLC) combined with exhaustive label-free LC/MS/MS shotgun profiling. To support these studies, we have built a companion data analysis software pipeline, termed ComplexQuant. Proteins present in the hundreds of fractions typically collected per experiment are first identified by exhaustively interrogating MS/MS spectra using multiple database search engines within an integrative probabilistic framework, while accounting for possible post-translation modifications. Protein abundance is then measured across the fractions based on normalized total spectral counts and precursor ion intensities using a dedicated tool, PepQuant. This analysis allows co-complex membership to be inferred based on the similarity of extracted protein co-elution profiles. Each computational step has been optimized for processing large-scale biochemical fractionation datasets, and the reliability of the integrated pipeline has been benchmarked extensively. This article is part of a Special Issue entitled: Proteomics from protein structures to clinical applications (CNPN 2012).

Application of Quantitative Proteomics Technologies to the Biomarker Discovery Pipeline for Multiple Sclerosis

Multiple sclerosis (MScl) is an inflammatory-mediated demyelinating disorder most prevalent in young Caucasian adults. The various clinical manifestations of the disease present several challenges in the clinic in terms of diagnosis, monitoring disease progression and response to treatment. Advances in mass spectrometry-based proteomic technologies have revolutionized the field of biomarker research and paved the way for the identification and validation of disease-specific markers. This review focuses on the novel candidates discovered by the application of quantitative proteomics to relevant disease-affected tissues in both the human context and within the animal model of the disease known as experimental autoimmune encephalomyelitis (EAE). The role of targeted mass spectrometry approaches for biomarker validation studies, such as multiple reaction monitoring (MRM) will also be discussed.

Targeted Protein Identification, Quantification and Reporting for High-resolution Nanoflow Targeted Peptide Monitoring

Mass spectrometry-based targeted proteomic assays are experiencing a surge in awareness due to the diverse possibilities arising from the re-application of traditional LC-SRM technology. The FDA-approved quantitative LC-SRM-pipeline in drug discovery motivates the use to quantitatively validate putative proteomic biomarkers. However, complexity of biological specimens bears a huge challenge to identify, in parallel, specific peptides and proteins of interest from large biomarker candidate lists. Methods have been devised to increase scan speeds, improve detection specificity and verify quantitative SRM-features. In contrast, high-resolution mass spectrometers could be used to improve reliability and precision of targeted proteomics assays. Here, we present a new method for identifying, quantifying and reporting peptides in high-resolution targeted proteomics experiments performed on an Orbitrap hybrid instrument using stable isotope-labeled internal reference peptides. This high precision TPM method has unique advantages over existing techniques, including the need to only detect the most abundant product ion of a given target for confident peptide identification using a scoring function that evaluates assay performance based on 1) m/z-mass accuracy, 2) retention time accuracy of observed species relative to prediction, and 3) retention time accuracy relative to internal reference peptides. Further, we show management of multiplexed precision TPM-assays using sentinel peptide standards. This article is part of a Special Issue entitled: Proteomics from protein structures to clinical applications (CNPN 2012).

Membrane Proteomics by High Performance Liquid Chromatography - Tandem Mass Spectrometry: Analytical Approaches and Challenges

Membrane proteins play diverse biologically important structural and functional roles including molecular transport, cell communication and signal transduction. The dysfunction of many are linked to deleterious human diseases and thus are of utmost importance in drug discovery. Membrane proteins comprise approximately 20-30% of all open reading frames, however they are typically under-represented in many liquid chromatography - mass spectrometry (LC-MS) proteomics experiments due to their low abundance and poor solubility. To address these analytical challenges, various membrane protein enrichment, solubilization, digestion and fractionation strategies have been employed to further improve the coverage of the membrane systems while maintaining compatibility with MS detection. This review discusses both established and emerging high-throughput gel-free analytical workflows in membrane proteomics, and the inherent advantages, disadvantages and orthogonality of the various approaches. The issues of critical importance for successful LC-MS/MS detection such as detergent selection and minimizing ion suppression in detergent-based workflows are discussed in detail. Recent studies comparing the performance of different analytical strategies are highlighted in order to provide practical insight into the choice of the most appropriate method for membrane-centric applications ranging from cell surface biomarker discovery to membrane protein interaction network mapping.

Waiting
simple hit counter