The Hi-C method allows unbiased, genome-wide identification of chromatin interactions (1). Hi-C couples proximity ligation and massively parallel sequencing. The resulting data can be used to study genomic architecture at multiple scales: initial results identified features such as chromosome territories, segregation of open and closed chromatin, and chromatin structure at the megabase scale.
The three-dimensional folding of chromosomes compartmentalizes the genome and and can bring distant functional elements, such as promoters and enhancers, into close spatial proximity 2-6. Deciphering the relationship between chromosome organization and genome activity will aid in understanding genomic processes, like transcription and replication. However, little is known about how chromosomes fold. Microscopy is unable to distinguish large numbers of loci simultaneously or at high resolution. To date, the detection of chromosomal interactions using chromosome conformation capture (3C) and its subsequent adaptations required the choice of a set of target loci, making genome-wide studies impossible 7-10.
We developed Hi-C, an extension of 3C that is capable of identifying long range interactions in an unbiased, genome-wide fashion. In Hi-C, cells are fixed with formaldehyde, causing interacting loci to be bound to one another by means of covalent DNA-protein cross-links. When the DNA is subsequently fragmented with a restriction enzyme, these loci remain linked. A biotinylated residue is incorporated as the 5′ overhangs are filled in. Next, blunt-end ligation is performed under dilute conditions that favor ligation events between cross-linked DNA fragments. This results in a genome-wide library of ligation products, corresponding to pairs of fragments that were originally in close proximity to each other in the nucleus. Each ligation product is marked with biotin at the site of the junction. The library is sheared, and the junctions are pulled-down with streptavidin beads. The purified junctions can subsequently be analyzed using a high-throughput sequencer, resulting in a catalog of interacting fragments.
Direct analysis of the resulting contact matrix reveals numerous features of genomic organization, such as the presence of chromosome territories and the preferential association of small gene-rich chromosomes. Correlation analysis can be applied to the contact matrix, demonstrating that the human genome is segregated into two compartments: a less densely packed compartment containing open, accessible, and active chromatin and a more dense compartment containing closed, inaccessible, and inactive chromatin regions. Finally, ensemble analysis of the contact matrix, coupled with theoretical derivations and computational simulations, revealed that at the megabase scale Hi-C reveals features consistent with a fractal globule conformation.
This method was used in the research reported in Lieberman-Aiden et al., Science 326, 289-293 (2009).
I. Crosslinking, Digestion, Marking of DNA Ends, and Blunt-end Ligation
II. Shearing and Size Selection
III. Biotin Pull-down and Paired-end Sequencing
IV. Representative Hi-C Results
Figure 1. Hi-C overview. Cells are cross-linked with formaldehyde, resulting in covalent links between spatially adjacent chromatin segments (DNA fragments: dark blue, red; Proteins, which can mediate such interactions, are shown in light blue and cyan). Chromatin is digested with a restriction enzyme (here, HindIII; restriction site: dashed line, see inset). The resulting sticky ends are filled in with nucleotides, one of which is biotinylated (purple dot). Ligation is performed under extremely dilute conditions favoring intramolecular ligation events; the HindIII site is lost and an NheI site is created (inset). DNA is purified and sheared, and biotinylated junctions are isolated using streptavidin beads. Interacting fragments are identified by paired-end sequencing.
Figure 2. Hi-C library quality controls. (A) Increasing amounts of a 3C control and a Hi-C library were resolved on a 0.8% agarose gel. Both libraries run as a rather tight band larger than 10 kb. Typical ligation efficiency in a Hi-C library is slightly lower than what is observed in a 3C template, and is indicated by the smear in the Hi-C lanes. (B) PCR digest control. A ligation junction formed by two nearby fragments is amplified using standard 3C PCR conditions. Hi-C ligation products can be distinguished from those produced in conventional 3C by digestion of the ligation site. Hi-C junctions are cut by NheI, not HindIII; the reverse is true for 3C junctions. 70% of Hi-C amplicons were cut by NheI, confirming efficient marking of ligation junction. Two replicates were performed to ensure reliable quantification.
Figure 3. Hi-C read quality controls. (A) Reads from fragments corresponding to both intrachromosomal (blue) and interchromosomal (red) interactions align significantly closer to HindIII restriction sites as compared to randomly generated reads (green). Both the intrachromosomal reads and interchromosomal reads curves decrease rapidly as the distance from the HindIII site increases until a plateau is reached at a distance of ~500 bp. This corresponds to the maximum fragment size used for sequencing. (B) Typically, 55% of the alignable read pairs represent interchromosomal interactions. Fifteen percent represent intrachromosomal interactions between fragments less than 20 kb apart and 30% are intrachromosomal read pairs that are more than 20 kb apart. This distribution may be sampled prior to high-throughput sequencing, as a form of quality control; cloning and Sanger sequencing of about 100 clones is usually sufficient.
Figure 4. Correlation analysis demonstrates that the nucleus is segregated into two compartments. (A) Heatmap corresponding to intrachromosomal interactions on chromosome 14. Each pixel represents all interactions between a 1-Mb locus and another 1-Mb locus; intensity corresponds to the total number of reads (range: 0-200 reads). Tick marks appear every 10 Mb. The heatmap exhibits substructure in the form of an intense diagonal and a constellation of large blocks. (Chromosome 14 is acrocentric; the short arm is not shown.) Using the Hi-C dataset to compute the average contact probability for a pair of loci at a given genomic distance, an expectation matrix is produced (B) corresponding to what would be observed if there were no long-range structures. The quotient of these two matrices is an observed/expected matrix (C) where depletion is shown in blue and enrichment in red [range: 0.2 (blue) to 5 (red)]. The block pattern becomes more evident. The correlation matrix (D) illustrates the correlation [range: -1 (blue) to 1 (red)] between the intrachromosomal interaction profiles of every pair of loci along chromosome 14. The striking plaid pattern indicates the presence of two compartments within the chromosome.
Figure 5. The presence and organization of chromosome territories. (A) Probability of contact decreases as a function of genomic distance on chromosome 1, eventually reaching a plateau at ~90Mb (blue). The level of interchromosomal contact (black dashes) differs for different pairs of chromosomes; loci on chromosome 1 are most likely to interact with loci on chromosome 10 (green dashes) and least likely to interact with loci on chromosome 21 (red dashes). Interchromosomal interactions are depleted relative to intrachromosomal interactions. (B) Observed/expected number of interchromosomal contacts between all pairs of chromosomes. Red indicates enrichment, and blue indicates depletion [range: 0.5 (blue) to 2 (red)]. Small, gene-rich chromosomes tend to interact more with one another.
Figure 6. The local packing of chromatin is consistent with the behavior of a fractal globule. (A) Contact probability as a function of genomic distance, averaged across the genome (blue). A prominent power law scaling is seen between 500kb and 7Mb (shaded region) with a slope of -1.08 (fit shown in cyan). (B) Simulation results for contact probability as a function of distance for equilibrium (red) and fractal (blue) globules. The slope for a fractal globule is very nearly -1 (cyan), confirming our novel theoretical prediction 1. The slope for an equilibrium globule is -3/2, which matches prior theoretical expectations. The slope for the fractal globule closely resembles the slope observed in the Hi-C results, whereas the slope for an equilibrium globule is not seen in the Hi-C data. (C) Top: An unfolded polymer chain, 4000 monomers long. Coloration corresponds to distance from one endpoint, ranging from blue to cyan, green, yellow, orange, and red. Middle: Typical example of a fractal globule drawn from our ensemble. Fractal globules lack entanglements. Loci that are nearby along the contour tend to be nearby in 3D, leading to the presence of large monochromatic blocks that are apparent on the surface and in cross-section. Bottom: An equilibrium globule. The structure is highly entangled; loci that are nearby along the contour (similar color) need not be nearby in 3D.
We present a method of studying the 3-dimensional architecture of the genome by mapping chromatin interactions in an unbiased, genome-wide manner. The most critical experimental step what sets this technology apart from previous work – is the incorporation of biotinylated nucleotides at the restriction ends of crosslinked fragments before blunt-end ligation. Performing this step successfully enables deep sequencing of all ligation junctions, and gives Hi-C its scope and power.
The number of reads will ultimately determine the resolution of the interaction maps. Here, a 1 Mb interaction map for the human genome is presented using ~30 million alignable reads. In order to increase ‘all-purpose’ resolution by a factor of n, the number of reads must be increased by a factor of n2.
The Hi-C technique may be readily combined with other techniques, such as hybrid capture after library generation (to target specific parts of the genome) and chromatin immunoprecipitation after ligation (to examine the chromatin environment of regions associated with specific proteins).
The authors have nothing to disclose.
We thank A. Kosmrlj for discussions and code; A. P. Aiden, X. R. Bao, M. Brenner, D. Galas, W. Gosper, A. Jaffer, A. Melnikov, A. Miele, G. Giannoukos, C. Nusbaum, A. J. M. Walhout, L. Wood, and K. Zeldovich for discussions; and L. Gaffney and B. Wong for help with visualization.
Supported by a Fannie and John Hertz Foundation graduate fellowship, a National Defense Science and Engineering graduate fellowship, an NSF graduate fellowship, the National Space Biomedical Research Institute, and grant no. T32 HG002295 from the National Human Genome Research Institute (NHGRI) (E.L.); i2b2 (Informatics for Integrating Biology and the Bedside), the NIH-supported Center for Biomedical Computing at Brigham and Women s Hospital (L.A.M.); grant no. HG003143 from the NHGRI, and a Keck Foundation distinguished young scholar award (J.D.). Raw and mapped Hi-C sequence data has been deposited at the GEO database (www.ncbi.nlm.nih.gov/geo/), accession no. GSE18199. Additional visualizations are available at http://hic.umassmed.edu.
Material Name | Type | Company | Catalogue Number | Comment |
---|---|---|---|---|
Protease inhibitors | Sigma | P8340-5ml | Step 1.2 | |
biotin-14-dCTP | Invitrogen | 19518-018 | Step 1.6 | |
Klenow | NEB | M0210 | Steps 1.6 and 2.2 | |
T4 DNA ligase | Invitrogen | 15224 | Step 1.9 | |
T4 DNA polymerase | NEB | M0203 | Steps 1.17 and 2.2 | |
10x ligation buffer | NEB | B0202 | Steps 2.2 and 3.4 | |
T4 PNK | NEB | M0201 | Step 2.2 | |
Klenow (exo-) | NEB | M0212 | Step 2.3 | |
Dynabeads MyOne Streptavin C1 Beads | Invitrogen | 650.01 | Step 3.2 | |
T4 DNA ligase HC | Enzymatics | L603-HC-L | Step 3.5 | |
Phusion HF mastermix | NEB | F531 | Step 3.8 | |
Ampure beads | Beckman Coulter | A2915 | Step 3.9 |