Identifying the direct targets of genome-targeting molecules remains a major challenge. To understand how DNA-binding molecules engage the genome, we developed a method that relies on crosslinking of small molecules to isolate chromatin (COSMIC).
The genome is the target of some of the most effective chemotherapeutics, but most of these drugs lack DNA sequence specificity, which leads to dose-limiting toxicity and many adverse side effects. Targeting the genome with sequence-specific small molecules may enable molecules with increased therapeutic index and fewer off-target effects. N-methylpyrrole/N-methylimidazole polyamides are molecules that can be rationally designed to target specific DNA sequences with exquisite precision. And unlike most natural transcription factors, polyamides can bind to methylated and chromatinized DNA without a loss in affinity. The sequence specificity of polyamides has been extensively studied in vitro with cognate site identification (CSI) and with traditional biochemical and biophysical approaches, but the study of polyamide binding to genomic targets in cells remains elusive. Here we report a method, the crosslinking of small molecules to isolate chromatin (COSMIC), that identifies polyamide binding sites across the genome. COSMIC is similar to chromatin immunoprecipitation (ChIP), but differs in two important ways: (1) a photocrosslinker is employed to enable selective, temporally-controlled capture of polyamide binding events, and (2) the biotin affinity handle is used to purify polyamide–DNA conjugates under semi-denaturing conditions to decrease DNA that is non-covalently bound. COSMIC is a general strategy that can be used to reveal the genome-wide binding events of polyamides and other genome-targeting chemotherapeutic agents.
The information to make each cell in the human body is encoded in DNA. The selective use of that information governs the fate of a cell. Transcription factors (TFs) are proteins that bind to specific DNA sequences to express a particular subset of the genes in the genome, and the malfunction of TFs is linked to the onset of a wide array of diseases, including developmental defects, cancer, and diabetes.1,2 We have been interested in developing molecules that can selectively bind to the genome and modulate gene regulatory networks.
Polyamides composed of N-methylpyrrole and N-methylimidazole are rationally-designed molecules that can target DNA with specificities and affinities that rival natural transcription factors.3-6 These molecules bind to specific sequences in the minor groove of DNA.4,5,7-11 Polyamides have been employed to both repress and activate the expression of specific genes.4,12-19 They also have interesting antiviral20-24 and anticancer12,13,25-30 properties. One attractive feature of polyamides is their ability to access DNA sequences that are methylated31,32 and wrapped around histone proteins9,10,33.
To measure the comprehensive binding specificities of DNA-binding molecules, our lab created the cognate site identifier (CSI) method.34-39 The predicted occurrence of binding sites based on in vitro specificities (genomescapes) can be displayed on the genome, because the in vitro binding intensities are directly proportional to association constants (Ka).34,35,37 These genomescapes provide insight into polyamide occupancy across the genome, but measuring polyamide binding in live cells has been a challenge. DNA is tightly packaged in the nucleus, which could influence the accessibility of binding sites. The accessibility of these chromatinized DNA sequences to polyamides remains a mystery.
Recently, many methods to study interactions between small molecules and nucleic acids have emerged.40-48 The chemical affinity capture and massively-parallel DNA sequencing (chem-seq) is one such technique. Chem-seq uses formaldehyde to crosslink small molecules to a genomic target of interest and a biotinylated derivative of a small molecule of interest to capture the ligand–target interaction.48,49
Formaldehyde crosslinking leads to indirect interactions that can produce false positives.50 We developed a new method, the crosslinking of small molecules to isolate chromatin (COSMIC),51 with a photocrosslinker to eliminate these so-called “phantom” peaks.50 To begin, we designed and synthesized trifunctional derivatives of polyamides. These molecules contained a DNA-binding polyamide, a photocrosslinker (psoralen), and an affinity handle (biotin, Figure 1). With trifunctional polyamides, we can covalently capture polyamide–DNA interactions with 365 nm UV irradiation, a wavelength that does not damage DNA or induce non-psoralen-based crosslinking.51 Next, we fragment the genome and purify the captured DNA under stringent, semi-denaturing conditions to decrease DNA that is non-covalently bound. Thus, we view COSMIC as a method related to chem-seq, but with a more direct readout of DNA targeting. Importantly, the weak (Ka 103-104 M-1) affinity of psoralen for DNA does not detectably impact polyamide specificity.51,52 The enriched DNA fragments can be analyzed by either quantitative polymerase chain reaction51 (COSMIC-qPCR) or by next-generation sequencing53 (COSMIC-seq). These data enable an unbiased, genome-guided design of ligands that interact with their desired genomic loci and minimize off-target effects.
Figure 1. Bioactive polyamides and COSMIC scheme. (A) Hairpin polyamides 1–2 target the DNA sequence 5′-WACGTW-3′. Linear polyamides 3–4 target 5′-AAGAAGAAG-3′. Rings of N-methylimidazole are bolded for clarity. Open and filled circles represent N-methylpyrrole and N-methylimidazole, respectively. Square represents 3-chlorothiophene, and diamonds represent β-alanine. Psoralen and biotin are denoted by P and B, respectively. (B) COSMIC scheme. Cells are treated with trifunctional derivatives of polyamides. After crosslinking with 365 nm UV irradiation, cells are lysed and genomic DNA is sheared. Streptavidin-coated magnetic beads are added to capture polyamide–DNA adducts. The DNA is released and can be analyzed by quantitative PCR (qPCR) or by next-generation sequencing (NGS). Please click here to view a larger version of this figure.
1. Crosslinking in Live Cells
2. Isolation of Chromatin
3. Capture of Ligand–DNA Crosslinks
4. Isolation of Affinity-purified DNA
To account for non-uniform genome fragmentation and other variables, the purified DNA should always be normalized against a reference of Input DNA. Primers specific to a locus of interest can be used. It is helpful to also analyze a locus where the molecule is not expected to bind, as a negative control. We see a >100-fold increase in polyamide occupancy upon irradiation with 365-nm light (Figure 2).
Enriched DNA can also be analyzed by next-generation sequencing. DNA is prepared for sequencing in the same manner as ChIP DNA is prepared (e.g., with a commercial sample prep kit, see Materials List). Even with next-generation sequencing, we still use qPCR to confirm enrichment of DNA at loci where we expect the polyamide to be bound. Once sequenced, raw DNA reads are aligned to the genome and density traces are prepared with the same software designed to analyze ChIP-seq data (Figure 3). Based on our analysis of polyamide distributions in cells, we found that clustered binding sites, spanning a broad range of affinities, best predict occupancy in cells. We developed an algorithm to score the entire genome for binding with our in vitro CSI data (Figure 4A). This scoring method revealed that different genomic loci with similar predicted binding scores exhibit the diverse clustering of multiple sites of varying affinities (Figure 4B).
Figure 2. Results of COSMIC-qPCR. Effect of 365 nm UV irradiation on the fraction of AP/input from HEK293 cells. AP, affinity-purified. Results are plotted on log scale and represent mean ± s.e.m.
Figure 3. Results of COSMIC-seq in H1 cells. COSMIC-seq tag density track for linear polyamide 4 designed to target AAG repeats. Tag density was normalized to 107 tags and displayed with the Integrated Genome Viewer. Please click here to view a larger version of this figure.
Figure 4. Model of polyamide binding. (A) Violin plots of predicted scores for 2 and 4 binding across the entire genome. Representative genomescapes for 4 are also shown. With the bioinformatic scoring method employed, genomic loci can sum to the same predicted score in different ways. (B) Loci with multiple low- and medium-affinity sequences show similar polyamide occupancy to loci with few high-affinity sequences. Reprinted with permission.51 Please click here to view a larger version of this figure.
One of the primary challenges with conventional ChIP is the identification of suitable antibodies. ChIP depends heavily upon the quality of the antibody, and most commercial antibodies are unacceptable for ChIP. In fact, the Encyclopedia of DNA Elements (ENCODE) consortium found only 20% of commercial antibodies to be suitable for ChIP assays.50 With COSMIC, antibodies are replaced by streptavidin. Because polyamides are functionalized with biotin, streptavidin is used in place of an antibody to capture polyamide–DNA crosslinks. The interaction between biotin and streptavidin is one of the strongest known in nature and one thousand to one million times stronger than the interaction of an antibody with its ligand.62 We harness this robust interaction, and the intrinsic stability of streptavidin, by subjecting captured polyamide–DNA crosslinks to harsh washes that would denature an antibody. These washes reduce the background signal of COSMIC. Furthermore, biotinylated proteins such as histones unfold and dissociate from DNA under these harsh conditions.63,64
The addition of the biotin moiety increases the size of the molecule, which could be reduced by substitution with a ‘clickable’ handle such as an alkyne.43-46 This handle could be used to introduce the biotin moiety after the cells have been lysed. Taken together, COSMIC avoids one of the major sources of unreliability that plagues many ChIP experiments.
Although formaldehyde will crosslink protein–DNA interactions, formaldehyde also crosslinks protein–protein interactions.65,66 Thus, formaldehyde can lead to the misidentification of DNA-binding events that are indirect or transient.65,66 This issue is especially prominent in highly transcribed regions.67 COSMIC uses a photocrosslinker (psoralen) instead of formaldehyde to avoid the issues of formaldehyde-based crosslinking. Because a light-sensitive crosslinker is used, the sample must be shielded from light to prevent premature crosslinking.
We have performed COSMIC in multiple conditions and cell types. The cell type, concentration of polyamide, and duration of treatment can all be varied, but it is important to confirm that polyamide treatment does not induce cellular toxicity. It is also critical to empirically determine a sonication time that produces sheared DNA with a size between 100 and 500 bp. If the COSMIC signal is low, optimize the number of cells, incubation time, concentration of polyamide, and UV crosslinking time.
The bioinformatic methods to analyze ChIP-seq data are compatible with COSMIC-seq data. Similar to ChIP, COSMIC requires normalization to a reference sample of fragmented DNA that has not been enriched with an antibody or streptavidin. This sample, normally called input DNA, is designed to control for many possible sources of bias including sonication efficiency, PCR bias, sequencing bias, and mappability of DNA fragments. We have found success with the ENCODE guidelines to identify peaks in our COSMIC-seq datasets.50,68
Some of the most successful therapeutic agents are molecules that bind to DNA and interfere with an array of genomic transactions.43,69,70 COSMIC can be readily applied to small molecules other than polyamides. The crosslinker can be UV active, as we have chosen, or it could also be DNA-reactive.69,70 In summary, COSMIC reveals the binding sites of polyamides and other ligands throughout the genome and uncovers new strategies to selectively target DNA.
The authors have nothing to disclose.
We thank members of the Ansari lab and Prof. Parameswaran Ramanathan for helpful discussions. This work was supported by NIH grants CA133508 and HL099773, the H. I. Romnes faculty fellowship, and the W. M. Keck Medical Research Award to A.Z.A. G.S.E. was supported by a Peterson Fellowship from the Department of Biochemistry and Molecular Biosciences Training Grant NIH T32 GM07215. A.E. was supported by the Morgridge Graduate Fellowship and the Stem Cell and Regenerative Medicine Center Fellowship, and D.B. was supported by the NSEC grant from NSF.
Phenylmethylsulfonyl fluoride (PMSF) | any source | ||
Benzamidine | any source | ||
Pepstatin | any source | ||
Proteinase K | any source | ||
Dynabeads MyOne Streptavidin C1 | Life Technologies | 65001 | |
PBS, pH 7.4 | Life Technologies | 10010-023 | Other sources can be used |
StemPro® Accutase® Cell Dissociation Reagent | Life Technologies | A1110501 | |
QIAquick PCR Purification Kit | Qiagen | 28104 | We have tried other manufacturers of DNA columns with success. |
TruSeq ChIP Sample Prep Kit | Illumina | IP-202-1012 | This Kit can be used to prepare COSMIC DNA for next-generation sequencing |
Matrigel Basement Membrane Matrix | BD Biosciences | 356231 | Used to coat plates in order to grow H1 ESCs |
pH paper | any source | ||
amber microcentrifuge tubes | any source | ||
microcentrifuge tubes | any source | ||
pyrex filter | any source | Pyrex baking dishes are suitable | |
qPCR master mix | any source | ||
RNase | any source | ||
HCl (6 N) | any source | ||
10-cm tissue culture dishes | any source | ||
Serological pipettes | any source | ||
Pasteur pipettes | any source | ||
Pipette tips | any source | ||
15-mL conical tubes | any source | ||
centrifuge | any source | ||
microcentrifuge | any source | ||
nutator | any source | ||
Magnetic separation rack | any source | ||
UV source | CalSun | B001BH0A1A | Other UV sources can be used, but crosslinking time must be optimized empirically |
Misonix Sonicator | Qsonica | S4000 with 431C1 cup horn | Other sonicators can be used, but sonication conditions must be optimized empirically |
Humidified CO2 incubator | any source | ||
Biological safety cabinet with vacuum outlet | any source |