Affinity purification of tagged proteins in combination with mass spectrometry (APMS) is a powerful method for the systematic mapping of protein interaction networks and for investigating the mechanistic basis of biological processes. Here, we describe an optimized sequential peptide affinity (SPA) APMS procedure developed for the bacterium Escherichia coli that can be used to isolate and characterize stable multi-protein complexes to near homogeneity even starting from low copy numbers per cell.
Since most cellular processes are mediated by macromolecular assemblies, the systematic identification of protein-protein interactions (PPI) and the identification of the subunit composition of multi-protein complexes can provide insight into gene function and enhance understanding of biological systems1, 2. Physical interactions can be mapped with high confidence vialarge-scale isolation and characterization of endogenous protein complexes under near-physiological conditions based on affinity purification of chromosomally-tagged proteins in combination with mass spectrometry (APMS). This approach has been successfully applied in evolutionarily diverse organisms, including yeast, flies, worms, mammalian cells, and bacteria1-6. In particular, we have generated a carboxy-terminal Sequential Peptide Affinity (SPA) dual tagging system for affinity-purifying native protein complexes from cultured gram-negative Escherichia coli, using genetically-tractable host laboratory strains that are well-suited for genome-wide investigations of the fundamental biology and conserved processes of prokaryotes1, 2, 7. Our SPA-tagging system is analogous to the tandem affinity purification method developed originally for yeast8, 9, and consists of a calmodulin binding peptide (CBP) followed by the cleavage site for the highly specific tobacco etch virus (TEV) protease and three copies of the FLAG epitope (3X FLAG), allowing for two consecutive rounds of affinity enrichment. After cassette amplification, sequence-specific linear PCR products encoding the SPA-tag and a selectable marker are integrated and expressed in frame as carboxy-terminal fusions in a DY330 background that is induced to transiently express a highly efficient heterologous bacteriophage lambda recombination system10. Subsequent dual-step purification using calmodulin and anti-FLAG affinity beads enables the highly selective and efficient recovery of even low abundance protein complexes from large-scale cultures. Tandem mass spectrometry is then used to identify the stably co-purifying proteins with high sensitivity (low nanogram detection limits).
Here, we describe detailed step-by-step procedures we commonly use for systematic protein tagging, purification and mass spectrometry-based analysis of soluble protein complexes from E. coli, which can be scaled up and potentially tailored to other bacterial species, including certain opportunistic pathogens that are amenable to recombineering. The resulting physical interactions can often reveal interesting unexpected components and connections suggesting novel mechanistic links. Integration of the PPI data with alternate molecular association data such as genetic (gene-gene) interactions and genomic-context (GC) predictions can facilitate elucidation of the global molecular organization of multi-protein complexes within biological pathways. The networks generated for E. coli can be used to gain insight into the functional architecture of orthologous gene products in other microbes for which functional annotations are currently lacking.
1. Construction of Gene-specific SPA-tagging in E. coli DY330 Strain
2. Culturing and Sonication
3. Affinity Purification
4. Silver Staining
5. Proteolysis and Sample Preparation for Mass Spectrometry
6. Protein Identification by LTQ Orbitrap Velos Mass Spectrometer
The polypeptide components of the isolated complexes are identified using an LTQ Orbitrap Velos hybrid tandem mass spectrometer. The Orbitrap has exceptional resolving power (>60,000 Full Width Half Maximum, or FWHM) and mass accuracy (<2 ppm) that minimizes MS/MS sampling of irrelevant non-specific background contaminants detected in control purifications, while the high speed Velos ion trap component can detect and fragment low abundance peptides using both electron transfer dissociation and collision-induced dissociation modes. High confidence matches among resulting MS/MS spectra are mapped to reference E. coli protein sequences using database search algorithm like SEQUEST and each matching sequence evaluated during a probability algorithm like STATQUEST11. The total spectra number, peptide sequence uniqueness and common background contaminants detected in negative control (i.e. mock) purifications are considered to achieve a low empirical false-discovery rate. The following steps are performed for protein identification:
Once tagged bait proteins, which are expressed at endogenous levels are affinity-purified from logarithmic phase cultures the samples were run on a silver-stain gel to visualize the individual polypeptide components of the isolated stable complexes. We also subjected a second portion of the affinity-purified protein samples to gel-free tandem mass spectrometry (LCMS) to identify the corresponding polypeptide sequences. The effectiveness of this APMS procedure is shown with a representative SDS-PAGE analysis of the components of RNA polymerase (RNAP) and SufBCD iron-sulfur (Fe-S) protein complexes that were affinity-purified from E. coli (Figure 1C and D). Both SPA-tagged RNAP σ70 (RpoD) and YacL, a protein of unknown function, co-purified specifically with the core RNAP enzyme e.g., α (RpoA), β (RpoB), and β’ (RpoC) subunits, and with the RNAP recycling factor (HepA). In contrast to tagged RpoD, YacL bound additionally to an essential transcription termination/anti-termination factor (NusG), suggesting a specialized function in transcription. We also detected several other smaller co-purifying proteins by LCMS that were not apparent on the gel, including the RNAP ω subunit, RpoZ, and the transcription termination/anti-termination factors, NusA and NusD, with tagged RpoD or YacL proteins, respectively. Conversely, in an independent experiment, tagged SufB, SufC, and SufD co-purified with each other, indicating joint participation in Fe-S cluster biosynthesis as single scaffolding complex. This representative example highlights the fact that previously well-studied, highly annotated bacterial multi-protein complexes participating in essential biological processes often have novel associated components that can be efficiently identified using this approach.
To define the composition of stable multi-protein complexes, we assign co-purification scores2 to high-confidence interactions by taking into account the uniqueness of bait-prey, bait-bait and prey-prey relationships. Then using graph clustering procedures like Markov Clustering Algorithm2, we identify discrete protein clusters from the partitioned probabilistic PPI network. Putative interaction networks can be visualized using Cytoscape 12.
Although the proteomics approach described above can reliably predict physical interactions among bacterial proteins, it may not necessarily reveal the actual functional relationship. Thus, combining proteomics data with additional functional association evidence inferred by genomics methods can be used to investigate novel mechanistic roles of previously unannotated E. coli proteins (Figure 2). For example, using an integrative machine learning approach, we have defined sub-networks of protein complexes and functional modules that participate as broader functional neighborhoods in E. coli2.
Figure 1. Schematic overview of gene-specific SPA-tagging by recombineering and the tandem protein affinity purification procedure. Panel A: A drug selectable marker tagging cassette is amplified by the PCR using primers (1, 2) homologous to the sequences flanking on either side of a target gene’s translational termination codon (3). The cassette is transformed and integrated into the E. coli chromosome using a transiently induced ectopic λ-Red recombination-based system. SPA, Sequential Peptide Affinity system; Kan, Kanamycin resistance cassette; *, stop codon. The figure is adapted from Babu et al14. Panel B: Overview of the SPA-tag tandem protein affinity purification procedure. For membrane protein purifications, detergents are included in all buffer solutions except the final wash and elution buffers to avoid interferences with the mass spectrometry detection. Panel C: Silver-stained SDS polyacrylamide gel portraying components of SPA-tagged RNA polymerase (RpoD), a protein of unknown function (YacL) and Fe-S scaffold (SufBCD) proteins from E. coli after affinity purification. Panel D: Identification of RNAP and transcription/anti-termination factors co-purified from the endogenous affinity tagged RpoD (panel I) and YacL (panel II) proteins by tandem mass spectrometry. Other proteins that co-purified with tagged RpoD and YacL are not shown. Click here to view larger figure.
Figure 2. Putative E. coli functional neighborhoods showing assignments of cytosolic proteins of previously unknown function (from Hu et al2.; appended). The main hierarchical ‘clustergram’ (orange) shows the patterns of functional predictions and existing annotations (x-axis) for the functionally-orphan and characterized genes of E. coli (y-axis). Yellow and blue colors represent existing (curated databases) and inferred functions (Hu et al2., study), respectively, while shade intensities reflect confidence scores. Biological processes associated with various neighborhoods are indicated on the right. Insets (aquamarine, pink) show the individual components of a representative neighborhood based on the integrated physical (SPA-tagging) and functional interaction (comparative GC methods) network similarity scores. Click here to view larger figure.
A key aspect of the SPA-based APMS approach described here is that tagging is performed within the natural chromosomal context, thereby ensuring normal gene regulation is maintained (i.e. native bait promoter preserved, hence expression levels is not perturbed) and native stably-associated protein complexes are recovered at near-endogenous levels. Operon polarity issues are also avoided by including an outwardly oriented promoter in the selectable marker. This SPA-tagging approach is effective enough to purify the components of low-abundance complexes to near homogeneity, including membrane-associated assemblies, even if the subunits are expressed down to only a few molecules per bacterial cell. Overall, this protocol can be easily scaled-up for purifying large-sets of tagged soluble proteins in E. coli or in principle any other bacterial species for which tagging via recombineering is possible, or the ones that posses naturally high transformation capability, like Acinetobacter, an emerging workhorse of bacterial genetics, that has been used, for example, to build a strain library of targeted gene knockouts13.
A major concern inherent to large-scale proteomic approaches is the identification of promiscuous non-specific interactors and spurious trace contamination. Despite two rounds of enrichment, our routine SPA purifications regularly detect frequent contaminants which are usually high-abundance housekeeping proteins, such as ribosomal proteins and chaperones, which bind non-specifically to the chromatographic resin and /or bait proteins. This problem can be mitigated in two ways. First, the total spectral number and uniqueness of each protein identified with a particular bait is considered relative to a broader set of purifications of multiple unrelated protein baits and from untagged (i.e. wild type) negative control strains (i.e. mock affinity purification experiments) to minimize false positive associations. Second, all candidate protein interactions should be validated using reciprocal tagging and purification of the putative binding partner, with the goal of independently confirming individual protein-protein interactions to avoid rare cases in which the presence of the SPA-tag perturbs incorporation of native proteins into the endogenous protein assembly.
APMS-based proteomic surveys are limited by the lack of sensitivity in terms of identifying transient or sub-stoichiometric interactions. In such cases, single-step purification procedures may be used to enhance detection of weaker interaction partners. During the course of our ongoing, decade long study of the E. coli interactome, we have seen the detection capabilities of new generation mass spectrometers improve steadily. Older instrumentation is biased to recurrent identification of highly abundant proteins, often precluding detection of less abundant but potentially very important interacting proteins. Conversely, the Orbitrap Velos hybrid mass spectrometer we currently use has markedly improved resolution, mass accuracy, and sensitivity thereby allowing far more efficient, high throughput detection of low abundance proteins, including transient interactions. Finally, if the nature of the interaction is completely unknown we suggest researchers to: (i) apply stringent criteria to assign confidence scores to all putative database search matches. Proteins with two or more unique peptides are usually considered positive, assuming each match passes with a minimum likelihood threshold cut-off of 99% or greater probability; (ii) make sure to examine the specificity of candidate protein interactors recorded with the tagged bait in comparison to results obtained with negative control untagged strains (mock experiments) or unrelated protein baits; (iii) reciprocally confirm potential interactions containing an unknown protein of interest by creating the corresponding SPA tag bait fusion for large-scale purification or validate pair-wise interactions using traditional co-immunoprecipitation with protein specific or tag specific antibodies.
To date, using this systematic approach, we have managed to tag and purify around two-thirds of the ~2,750 E. coli soluble proteins that detectably expressed under culture in rich medium1, 2. We have also adapted this same basic method to systematically isolate membrane-associated protein complexes, and are now attempting to purify each of the remaining ~1,000 E. coli proteins predicted to be membrane bound. While the purification of membrane proteins often poses unique challenges because they are often not efficiently solubilized by the extraction buffer that we normally use for the SPA method, the addition of non-ionic detergents to our buffers enables the solubilization and purification of a majority of the E. coli membrane proteins we have attempted to date (Babu et al., unpublished data).
The authors have nothing to disclose.
This work was supported by funds from the Canadian Foundation for Innovation, Genome Canada, the Ontario Genomics Institute, the Ontario Ministry of Innovation, and the Canadian Institutes of Health Research grant to J.G. and A.E. The Red-expressing E. coli strain DY330 was a kind gift from Donald L. Court (National Cancer Institute, Frederick, MD).
Materials | Vendor | and Catalog Numbers | |
I. Antibiotics | |||
Kanamycin | Bioshop | #KAN201 | |
Ampicillin | Bioshop | #AMP201 | |
2. Terrific-Broth medium | |||
Bio-Tryptone | Bioshop | #TRP 402 | |
Yeast extract | Bioshop | #YEX 555 | |
Glycerol | Bioshop | #GLY 002 | |
K2HPO4 | Bioshop | #PPM 302 | |
KH2PO4 | Bioshop | #PPM 303 | |
3. Bacterial Strain and Plasmid | |||
DY330 | Yu et al. (2000)10 | ||
pJL148 | Zeghouf et al. (2004)7 | ||
4. PCR and Electrophoresis Reagents | |||
Taq DNA polymerase | Fermentas | # EP0281 | |
10 X PCR buffer | Fermentas | # EP0281 | |
10 mM dNTPs | Fermentas | # EP0281 | |
25 mM MgCl2 | Fermentas | # EP0281 | |
Agarose | Bioshop | # AGA002 | |
Loading dye | NEB | #B7021S | |
Ethidium bromide | Bioshop | # ETB444 | |
10X TBE buffer | Thermo Scientific | #28355 | |
Tris Base | Bioshop | #TRS001 | |
Boric acid | Bioshop | #BOR001 | |
0.5 M EDTA (pH 8.0) | Sigma | # E6768 | |
DNA ladder | NEB | #N3232L | |
5. Plasmid isolation and Clean-up Kits | |||
Plasmid Midi kit | Qiagen | #12143 | |
QIAquick PCR purification kit | Qiagen | #28104 | |
6. PCR and Transformation Equipments | |||
Thermal cycler | BioRad | iCycler | |
Agarose gel electrophoresis | BioRad | ||
Electroporator | Bio-Rad | GenePulser II | |
0.2 cm electroporation cuvette | Bio-Rad | ||
42 °C water bath shaker | Innova 3100 | ||
Beckman Coulter TJ-25 centrifuge | Beckman Coulter | TS-5.1-500 | |
32 °C Shaker | New Brunswick Scientific, USA | ||
32 °C large Shaker | New Brunswick Scientific, USA | ||
32 °C plate incubator | Fisher Scientific | ||
7. Electrophoresis and Western blotting | |||
Acrylamide monomer, N,N’- methylenebis-acrylamide | Bio-Rad | #161-0125 | |
Ammonium persulfate | Bioshop | # AMP001 | |
n-butanol | Sigma | # B7906 | |
TEMED | Bioshop | #TEM001 | |
Whatman No. 1 filter paper | Fischer Scientific | #09-806A | |
Mini protean 3 cell | Bio-Rad | #165-3301 | |
iBlot gel transfer device | Invitrogen | #IB1001 | |
Nitrocellulose membranes | Bio-Rad | #162-0115 | |
Monoclonal Anti-Flag M2 antibody | Sigma | #F3165 | |
Horseradish peroxidase | Amersham | #NA931V | |
Pre-stained protein molecular weight standards | Bio-Rad | #161-0363 | |
Chemiluminescence reagent | PIERCE | #1856136 | |
Autoradiography film | Clonex Corp | #CLEC810 | |
Quick Draw blotting paper | Sigma | #P7796 | |
C2 platform rocking shaker | New Brunswick Scientific, USA | ||
8. Sonication Equipment and Reagents | |||
Sonicator | Branson Ultrasonic | #23395 | |
NaCl | Bioshop | #SOD001 | |
Protease inhibitors | Roche | #800-363-5887 | |
0.5 mM TCEP-HCl | Thermo Scientific | #20490 | |
9. Affinity Purification Reagents and Equipment | |||
0.8 x 4 cm Bio-Rad polypropylene column | Bio-Rad | #732-6008 | |
Benzonase nuclease | Novagen | #70746 | |
Anti-FLAG M2 agarose beads | Sigma | #A2220 | |
Calmodulin-sepharose beads | GE Healthcare | #17-0529-01 | |
TEV protease | Invitrogen | #12575-015 | |
Triton X-100 | Sigma | #T9284 | |
CaCl2 | Sigma | #C2661 | |
EGTA | Sigma | #E3889 | |
LabQuake Shaker | Thermolyne | #59558 | |
10. Silver Staining Reagents | |||
Methanol | Bioshop | #MET302 | |
Acetic acid | Bioshop | t#ACE222 | |
Sodium-thiosulfate | Sigma | #S-7143 | |
Silver nitrate | Fischer Scientific | #S181-100 | |
Formaldehyde | Bioshop | #FOR201 | |
Sodium carbonate | Bioshop | #SOC512 | |
11. Reagents and Equipment for Protein Identification | |||
Trypsin Gold, Mass Spectrometry Grade | Promega | # V5280 | |
50 mM NH4HCO3 | Bioshop | #AMC555 | |
1 mM CaCl2 | Bioshop | #CCL302 | |
Acetonitrile | Sigma | #A998-4 | |
Formic acid | Sigma | #F0507 | |
HPLC grade water | Sigma | #95304 | |
Iodoacetamide | Sigma | #16125 | |
Millipore Zip-Tip | Millipore | # ZTC18M960 | |
~10 cm of 3 μm Luna-C18 resin | Phenomenex | ||
Proxeon nano HPLC pump | Thermo Fisher Scientific | ||
LTQ Orbitrap Velos mass spectrometer | Thermo Fisher Scientific | ||
12. Labware | |||
4 liter conical flasks | VWR | #89000-372 | |
50 ml polypropylene falcon tubes | Any Vendor | ||
1.5 ml micro-centrifuge tubes | Any Vendor | ||
250 ml conical flaks | VWR | #29140-045 | |
15 ml sterile culture tubes | Thermo Scientific | #366052 | |
Cryogenic vials | VWR | #479-3221 | |
-80 °C freezer | Fisher Scientific | #13-990-14 | |
Speed vacuum system | Thermo Scientific | ||
Buffers and Solutions 1. 1 liter Terrific Broth (TB) media 11 g Bio-Tryptone 2. Potassium Salt Stock Solution 1.5 M K2HPO4 3. Sonication Buffer 20 mM Tris-HCl (pH 7.9) 4. AFC buffer 30 mM Tris-HCl (pH 7.9) 5. TEV cleavage buffer 30 mM Tris-HCl (pH 7.9) 6. Calmodulin binding buffer 30 mM Tris-HCl (pH 7.9) 7. Calmodulin wash buffer 30 mM Tris-HCl pH 7.9 8. Calmodulin elution buffer 30 mM Tris-HCl (pH 7.9) 9. Developing solution (1L) 37% Formaldehyde 10. Digestion buffer 50 mM NH4HCO3 11. Wetting and Equilibration solution 70% acetonitrile (ACN) in 0.1% formic acid 12. Washing solution 100% H2O in 0.1% formic acid |