This video article describes an in vitro microarray based method to determine the gene targets and binding sites for two component system response regulators.
In vivo methods such as ChIP-chip are well-established techniques used to determine global gene targets for transcription factors. However, they are of limited use in exploring bacterial two component regulatory systems with uncharacterized activation conditions. Such systems regulate transcription only when activated in the presence of unique signals. Since these signals are often unknown, the in vitro microarray based method described in this video article can be used to determine gene targets and binding sites for response regulators. This DNA-affinity-purified-chip method may be used for any purified regulator in any organism with a sequenced genome. The protocol involves allowing the purified tagged protein to bind to sheared genomic DNA and then affinity purifying the protein-bound DNA, followed by fluorescent labeling of the DNA and hybridization to a custom tiling array. Preceding steps that may be used to optimize the assay for specific regulators are also described. The peaks generated by the array data analysis are used to predict binding site motifs, which are then experimentally validated. The motif predictions can be further used to determine gene targets of orthologous response regulators in closely related species. We demonstrate the applicability of this method by determining the gene targets and binding site motifs and thus predicting the function for a sigma54-dependent response regulator DVU3023 in the environmental bacterium Desulfovibrio vulgaris Hildenborough.
The ability of bacteria to survive and thrive is critically dependent on how well they are able to perceive and respond to perturbations in their environments, and this in turn is dependent on their signal transduction systems. The number of signaling systems a bacterium encodes has been called its “microbial IQ” and can be an indication of both variability of its environment and its ability to sense multiple signals and fine tune its response1. Two component signal transduction systems (TCS) are the most prevalent signaling systems used by bacteria, and they consist of a histidine kinase (HK) that senses the external signal and transmits via phosphorylation to an effector response regulator (RR)2. RRs can have a variety of output domains and thus different effector modes, but the most common response is transcriptional regulation via a DNA binding domain1. The signals sensed and the corresponding functions of the vast majority of TCSs remain unknown.
Although in vivo methods such as ChIP-chip are routinely used for determination of genomic binding sites of transcription factors3, they can only be used for bacterial two component system RRs if the activating conditions or signals are known. Often the environmental cues that activate a TCS are harder to determine than their gene targets. The in vitro microarray based assay described here can be used to effectively and rapidly determine the gene targets and predict functions of TCSs. This assay takes advantage of the fact that RRs can be phosphorylated and thus activated in vitro using small molecule donors like acetyl phosphate4.
In this method, named DAP-chip for DNA-affinity-purified-chip (Figure 1), the RR gene of interest is cloned with a His-tag in E. coli, and a subsequently purified tagged protein is allowed to bind to sheared genomic DNA. The protein-bound DNA is then enriched by affinity-purification, the enriched and input DNA are amplified, fluorescently labeled, pooled together and hybridized to a tiling array that is custom made to the organism of interest (Figure 1). Microarray experiments are subject to artifacts and therefore additional steps are employed to optimize the assay. One such step is to attempt to determine one target for the RR under study using electrophoretic mobility shift assays (EMSA) (see workflow in Figure 2). Then, following binding to genomic DNA and the DAP steps, the protein-bound and input DNA are examined by qPCR to see if the positive target is enriched in the protein-bound fraction relative to the input fraction, thus confirming optimal binding conditions for the RR (Figure 2). After array hybridization, the data are analyzed to find peaks of higher intensity signal indicating genomic loci where the protein had bound. Functions may be predicted for the RR based on the gene targets obtained. The target genomic loci are used to predict binding site motifs, which are then experimentally validated using EMSAs (Figure 2). The functional predictions and gene targets for the RR may then be extended to closely related species that encode orthologous RRs by scanning those genomes for similar binding motifs (Figure 2). The DAP-chip method can provide a wealth of information for a TCS where previously there was none. The method can also be used for any transcriptional regulator if the protein can be purified and DNA binding conditions can be determined, and for any organism of interest with a genome sequence available.
Figure 1. The DNA-affinity-purified-chip (DAP-chip) strategy7. The RR gene from the organism of interest is cloned with a carboxy-terminal His-tag into an E. coli expression strain. Purified His-tagged protein is activated by phosphorylation with acetyl phosphate, and mixed with sheared genomic DNA. An aliquot of the binding reaction is saved as input DNA, while the rest is subjected to affinity purification using Ni-NTA resin. The input and the RR-bound DNA are whole genome amplified, and labeled with Cy3 and Cy5, respectively. The labeled DNA is pooled together and hybridized to a tiling array, which is then analyzed to determine the gene targets. Figure modified and reprinted using the creative commons license from7.
Figure 2. Summary of workflow. For any purified tagged protein, begin by determining a target using EMSA. Allow protein to bind genomic DNA and then DNA-affinity-purify (DAP) and whole genome amplify (WGA) the enriched and input DNA. If a gene target is known, use qPCR to ensure that the known target is enriched in the protein-bound fraction. If no target could be determined, proceed directly to DNA labeling and array hybridization. If enrichment by qPCR could not be observed, then repeat the protein-gDNA binding and DAP-WGA steps using different protein amounts. Use array analysis to find peaks and map them to target genes. Use the upstream regions of target genes to predict binding site motifs. Validate the motifs experimentally using EMSAs. Use the motif to scan the genomes of related species encoding orthologs of the RR under study, and predict genes targeted in those species as well. Based on the gene targets obtained, the physiological function of the RR and its orthologs may be predicted. Figure modified and reprinted using the creative commons license from7.
Note: The protocol below is tailored for determination of gene targets of the RR DVU3023 from the bacterium Desulfovibrio vulgaris Hildenborough. It can be adapted to any other transcriptional regulator of interest.
1. Clone and Purify RR
2. Determine Gene Target for RR Using Electrophoretic Mobility Shift Assay (EMSA)
3. Verify Target Enrichment after Genomic DNA-protein Binding
4. DNA Labeling and Array Hybridization
5. Binding Site Motif Prediction and Validation
6. Conservation of Motif in Other Related Bacterial Species
The above method was applied to determine the global gene targets of the RRs in the model sulfate reducing bacterium Desulfovibrio vulgaris Hildenborough7. This organism has a large number of TCSs represented by over 70 RRs, indicating the wide variety of possible signals that it senses and responds to. In vivo analyses on the functions of these signaling systems are hard to perform since their signals and thus their activating conditions are unknown. Here the DAP-chip method was used to determine the gene targets and thus predict possible functions for a representative RR DVU3023.
DVU3023 is a sigma-54-dependent RR encoded in an operon with its cognate HK (Figure 3A). The C-terminal His-tagged gene was cloned into and purified from E. coli. For initial target determination, the purified RR was tested for binding to its downstream operon which is a ten gene-operon (DVU3025-3035) consisting of lactate uptake and oxidation genes. RR DVU3023 shifted the upstream region of DVU3025 (Figure 3B). Next RR DVU3023 was allowed to bind to sheared D. vulgaris Hildenborough genomic DNA. Although phosphorylation was not required for binding to the promoter region of DVU3025, acetyl phosphate was added to the reaction in case it was required for binding to other promoters. Following affinity purification of the protein-bound DNA fraction, qPCR was used to show that the upstream region of DVU3025 is enriched (8.45 fold) in the protein-bound fraction (CT= 6.9) relative to the input DNA (Ct =9.98) (Figure 3C), thus indicating that the binding conditions used were appropriate for RR DVU3023. Lack of enrichment of the promoter region of a randomly chosen gene (DVU0013) was used as a negative control.
The protein-bound and input DNA samples were then labeled fluorescently and hybridized to a D. vulgaris Hildenborough tiling array that had a high probe density in the intergenic regions. The top four peaks were chosen as the most likely targets for DVU3023 (Figure 3D). These four peaks were followed by several others, which were also identified in DAP-chip analyses for several other RRs and hence appear to be sticky DNA (Table 1). Figure 3E is a schematic representation of the genes regulated by DVU3023. The positive target DVU3025 was the first peak obtained with the highest score. Two gene targets are two other singly encoded lactate permeases (DVU2451 and DVU3284). The fourth gene target does not lie in an upstream region, but in the intergenic region between two convergently transcribed genes/operons (DVU0652 and DVU0653). This is a large intergenic region, and additionally also encodes a predicted sigma54-dependent promoter. It is possible that there is an undiscovered sRNA encoded in this region that is regulated by RR DVU3023.
Using the upstream regions of the targets obtained by DAP-chip, MEME was used to predict a binding site motif (Figure 3F, Table 2). EMSA substrates carrying the specific motif upstream of DVU3025 were designed to confirm that RR DVU3023 recognizes and binds the predicted motif. The motif was further validated by making substitutions in the conserved bases within the motif which eliminated the binding shift (Figure 3G). This validated motif was then used in Perl-generated scripts to scan the genome sequences of other closely related sulfate reducing bacteria that had orthologs for DVU3023. Loci were chosen as possible gene targets when the motif was located in upstream regions of open reading frames (Table 3). Using the motif sequences predicted for the orthologous RRs, a consensus binding site motif was generated (Figure 3F) which closely resembled the one obtained for D. vulgaris Hildenborough alone.
Figure 3. Determining genomic targets for D. vulgaris response regulator DVU3023. A. DVU3023 is encoded in an operon with its cognate HK. The downstream operon has a sigma54-dependent promoter (bent black arrow) and was used as a candidate target gene. B. Purified RR DVU3023 bound and shifted the upstream region of DVU30257. C. q-PCR of upstream regions of DVU3025 (positive target) and DVU0013 (chosen as a negative control). E is protein-bound enriched DNA fraction, I is input DNA. D. Top four peaks obtained after DAP-chip analysis. Start and End refer to DNA coordinates at the start and end of the peaks; score refers to the log2R ratio of the fourth highest probe in the peak; Fdr = false discovery rate; cutoff_p is the cutoff percentage at which the peak was identified. E. Schematic representation of the gene targets for DVU3023 based on the DAP-chip peaks. Numbers in boxes indicate the peak number in D. The HK-RR genes are in green, target genes in blue and other genes in grey. Black bent arrows are sigma54-dependent promoters, green filled circles are predicted binding site motifs. Gene names are as follows: por –pyruvate ferredoxin oxidoreductase; llp – lactate permease; glcD- glycolate oxidase; glpC- Fe-S cluster binding protein; pta – phosphotransacetylase; ack- acetate kinase; lldE – lactate oxidase subunit; lldF/G – lactate oxidase subunit; MCP – methyl-accepting chemotaxis protein. F. Weblogo8 images of the predicted binding site motif. Top – derived from DAP-chip targets; Bottom – derived from binding sites present in genomes with orthologs of DVU30237. G. Validation of predicted binding site motif using EMSA. DVU3023 shifted the wild type motif (w) but not the modified motif (m). Sequences for the w and m motifs are shown on the right7. Figure modified and reprinted using the creative commons license from Rajeev et al7.
Table 1. Top 20 peaks from the DAP-chip array analysis. The table is divided into three sections. Peak attributes show details of the peaks as generated by array analysis software, where Location refers to whether the peak was found in the genome or the extrachromosomal plasmid, Start and End refer to the start and end loci for the peak, Score refers to the log2R ratios for the fourth highest probe in the peak, Fdr refers to the false discovery rate value, and cutoff_p is the cutoff percentage at which the peak was identified. The other two sections Start coordinate mapping and End coordinate mapping map the start and end loci, respectively, of the peak to the gene. In these sections Gene strand refers to the strand coding the gene, Offset indicates distance from the start of the gene (positive values indicate loci is upstream of gene, while negative values indicate loci is within the gene), Overlap gene value is TRUE if the locus overlaps a gene, DVU refers to the DVU# of the gene that the coordinates map to, and Description indicates the gene annotation. Table modified and reprinted using the creative commons license from Rajeev et al7. Please click here to view a larger version of this table.
Table 2. Sequences used to build the consensus DAP-chip target based motif in Figure 3. Table modified and reprinted using the creative commons license from Rajeev et al7.
Table 3. Binding site motifs for DVU3023 orthologs present in other sequenced Desulfovibrio and related species. For each genome scanned, the organism name is indicated, followed by the locus tag for the DVU3023 ortholog and its percent identity to DVU3023 in parentheses. Score indicates the value assigned by the Perl program based on similarity to the input sequences. Description states the gene annotations for the genes in the target operon. Table modified and reprinted using the creative commons license from Rajeev et al7. Please click here to view a larger version of this table.
The DAP-chip method described here was successfully used to determine the gene targets for several RRs in Desulfovibrio vulgaris Hildenborough7 of which one is shown here as a representative result. For RR DVU3023, choosing a candidate gene target was straightforward. DVU3025 is located immediately downstream of the RR gene, and the RR and target genes are conserved in several Desulfovibrio species, and additionally DVU3025 has a predicted sigma54-dependent promoter. The EMSA provides a simple method to rapidly test the RR for binding to the candidate target gene, and also allows the assessment of the activity of the purified protein sample, as well as determine optimal protein-DNA binding conditions.
The DNA binding activity can also be tested in the presence and absence of acetyl phosphate to see if phosphorylation is necessary for DNA binding. Not all RRs are phosphorylated by small molecule donors9, in these cases the purified cognate sensor kinase, if available, may be used to activate the protein. There are also atypical RRs known that lack key active site residues in the receiver domain and are not activated by phosphorylation10. For the majority of RRs studied, phosphorylation stimulates DNA binding11. However there are examples where phosphorylation does not affect in vitro DNA binding12, and there are examples where phosphorylation is required for binding13, and also cases where the subset of promoters bound increases with phosphorylation14. The DAP-chip method may be performed with and without phosphorylation to determine if there are differences in the set of promoters bound by the RR. However, it should also be noted that some RRs may be purified in a functionally phosphorylated form from E. coli13.
The protocol described here is for a His-tagged protein such that the protein-bound DNA is affinity purified using Ni-NTA agarose resin, but it can easily be adapted for any kind of tagged protein by using the appropriate affinity resin for pull-down. Following the DAP-WGA steps, the qPCR step provides an additional control to ensure that the protein-gDNA binding conditions were appropriate and that the protein-bound DNA was successfully enriched by the affinity purification. Greater than 3-fold enrichment is sufficient to proceed with the hybridization step. Unlike the EMSA reaction, there is no non-specific competitor DNA like poly-dI.dC added to the gDNA binding reaction since it interferes with the WGA step. Due to this if the protein sample has non-specific DNA binding activity then clear enrichment of the confirmed target DNA will not be observed by qPCR. Thus the qPCR control step can be used to optimize the amount of protein used in the gDNA binding reaction. Commercially available kits for whole genome amplification claim to introduce no amplification bias when following their guidelines for minimum starting DNA material and low number of amplification cycles. The amplification bias may be checked using qPCR with various primer sets on amplified versus unamplified DNA. Any effect on downstream array hybridization may also be determined by differentially labeling and hybridizing pooled amplified and unamplified genomic DNA.
The list of peaks that are generated following array data analysis is usually long with several peaks having a false discovery rate value of 0. Therefore a combination of other peak attributes such as high log2R scores and cutoff_p values (as generated by the array analysis software used in this study) are used to cull the list down to the highest confidence gene targets. The presence of the pre-determined gene target among the top five hits greatly strengthens the confidence in the data set. Performing this assay for a number of regulatory proteins from the same organism will also help to identify “sticky” DNA sequences that appear in several data sets7. Additionally if a binding site motif can be predicted and validated then the presence of the motif in peaks lower down in the list may also be used to choose a conservative target gene list. Enrichment of DNA sequences other than promoter regions may be artifacts of array hybridization or may indicate the regulation of previously unidentified open reading frames or small RNAs7. For regulators where a gene target could not be predetermined using EMSA, the DAP-chip assay may be performed “blind” 7. In such cases also, identification of a binding site motif will improve the selection of gene targets. Determining the protein concentration to be used in the assay will depend on the non-specific binding activity of the protein, which may be assessed by EMSA using randomly chosen DNA substrates. Lower concentrations work better for those proteins with high nonspecific binding activity. The reliability of a blind DAP-chip may be improved by performing replicate assays with different protein concentrations. For RRs with few targets, just two or even a single assay may be sufficient to generate a target list which may be validated using subsequent EMSAs. The DAP-chip data for such RRs usually show a clear jump in the log2R scores or cutoff_p values beyond the initial few peaks. For RRs with several targets, data from three replicates may be analyzed to generate a list of common peaks, some of which may be selected for EMSA validation.
The ability to predict binding site motifs based on the DAP-chip targets adds to the value of this method and vastly increases the information gained. EMSA again provides an efficient protocol to experimentally verify the predictions. Software scripts in Perl or other programming language may be used to rapidly search through available genome sequences. The results will identify both orthologous target genes as well as target genes uniquely regulated in the genomes searched. In the representative result shown here, three of the four upstream target regions identified for DVU3023 have annotated gene functions related to lactate uptake and oxidation. Additionally since the binding site motif for DVU3023 was validated, other genomes could be searched for similar motif sites. Together the results indicate that the DVU3022-3023 TCS is well conserved in the Desulfovibrio and other related sulfate-reducing bacteria, and that it regulates genes for lactate transport and oxidation to acetate. Since lactate is the primary carbon and electron source used by these organisms, DVU3023 likely plays a key role in their physiology. Among the top DAP-chip peaks for RR DVU3023, there was also an intergenic region. Although a binding site motif was not found in this region, the presence of a predicted sigma-54-dependent promoter suggests that there may be an unidentified orf or sRNA encoded, and that it could be a true target for DVU3023. This finding highlights the value of the experimental DAP-chip approach combined with binding site predictions as opposed to determining target sites based on computational predictions alone.
Similar in vitro methods such as the one described here have only been used in a few cases15-17 and very rarely for a previously unstudied regulator. The optimization EMSA and qPCR steps prior to the array hybridization will substantially aid in analyzing the results when the method is to be used for a novel regulator. If array printing becomes an impediment, the DAP steps may be combined with next generation sequencing to obtain binding sites18,19. As more array-based methods become substituted with sequencing based strategies, such future adaptation of the method circumvents the need to design custom tiling microarrays for the organism of interest. ChIP-Seq technologies result in greater sensitivity and specificity of detection of peaks when compared to ChIP-chip methods, and are also rapidly becoming more cost-effective20. Although this article focused on two component response regulators, this method can be used for any prokaryotic or eukaryotic transcription factor15. Nucleosome occupancies often have an effect on targets that are regulated and comparing the target binding sites for eukaryotic transcription factors by in vivo and in vitro analyses can reveal these effects21.
The authors have nothing to disclose.
We thank Amy Chen for her help in preparing for the video shoot and for demonstrating the technique. This work conducted by ENIGMA: Ecosystems and Networks Integrated with Genes and Molecular Assemblies (http://enigma.lbl.gov), a Scientific Focus Area Program at Lawrence Berkeley National Laboratory, was supported by the Office of Science, Office of Biological and Environmental Research, of the U. S. Department of Energy under Contract No. DE-AC02-05CH11231.
Name of Material/Equipment | Company | Catalog number | Comments |
HisTrapFF column (Ni-Sepharose column) | GE Lfe Sciences, Pittsburgh, PA, USA | 17-5255-01 | |
Akta explorer (FPLC instrument) | GE Lfe Sciences, Pittsburgh, PA, USA | ||
HiPrep 26/10 Desalting column | GE Lfe Sciences, Pittsburgh, PA, USA | 17-5087-01 | |
Qiaquick Gel extraction kit | Qiagen Inc, Valencia, CA, USA | 28704 | |
Biotin-labeled oligonucleotides | Integrated DNA Technologies | N/A | |
6% polyacrylamide-0.5X TBE precast mini DNA retardation gel | Life Technologies, Grand Island, NY, USA | EC63652BOX | Alternately, you can pour your own gel. |
Nylon membrane | EMD Millipore, Billerica, MA, USA | INYC00010 | |
Trans-Blot SD Semi-dry electrophoretic transfer cell | Biorad, Hercules, CA, USA | 170-3940 | |
Extra thick blot paper, 8 x 13.5 cm | Biorad, Hercules, CA, USA | 170-3967 | |
UV crosslinker Model XL-1000 | Fisher Scientific | 11-992-89 | |
Nucleic Acid chemiluminescent detection kit (Pierce) | Thermo fisher Scientific, Rockford, IL, USA | 89880 | |
Ni-NTA agarose resin | Qiagen Inc, Valencia, CA, USA | 30210 | |
GenomePlex Whole genome amplification kit (Fragmentation buffer, library preparation buffer, library stabilization solution, library preparation enzyme, 10X amplification master mix, WGA polymerase ) | Sigma-Aldrich, St. Louis, MO, USA | WGA2-50RXN | |
Nanodrop ND-1000 | Thermo Scientific, Wilmington, DE, USA | For quantitation of DNA | |
Perfecta Sybr Green SuperMix, with ROX | Quanta biosciences | 95055-500 | Any Sybr Green PCR mix may be used |
PlateMax Ultra clear heat sealing film for qPCR | Axygen | ||
96 well clear low profile PCR microplate | Life Technologies, Grand Island, NY, USA | PCR-96-LP-AB-C | |
Applied Biosystems StepOne Plus Real time PCR system | Life Technologies, Grand Island, NY, USA | 4376600 | Any real time PCR system may be used |
Qiaquick PCR purification kit | Qiagen Inc, Valencia, CA, USA | 28104 | Any PCR clean up kit may be used |
Cy3/Cy5-labeled nonamers | Trilink biotechnologies, San Diego, CA, USA | N46-0001, N46-0002 | |
Klenow polymerase 50,000U/ml, 3'-5' exo- | New England Biolabs, Ipswich, MA | M0212M | |
Hybridization system | Roche-Nimblegen, Madison, WI, USA | N/A | This company no longer makes arrays or related items, so alternate sources such as Agilent or Affymetrix will need to be used, |
Custom printed microarrays and mixers | Roche-Nimblegen, Madison, WI, USA | N/A | |
Hybridization kit (2X Hybridization buffer, Hybridization component A, Alignment oligo) | Roche-Nimblegen, Madison, WI, USA | N/A | |
Wash buffer kit (10X Wash buffer I, II, III, 1 M DTT) | Roche-Nimblegen, Madison, WI, USA | N/A | |
GenePix 4200A microarray scanner | Molecular Devices, Sunnyvale CA, USA | This model has been replaced by superior ones | |
GenePix Pro microarray software | Molecular Devices, Sunnyvale CA, USA | ||
Nimblescan v.2.4, ChIP-chip analysis software | Roche-Nimblegen, Madison, WI, USA | N/A |