Methyl-binding DNA capture Sequencing for Patient Tissues

Rohit R. Jadhav; Yao V. Wang; Ya-Ting Hsu; Joseph Liu; Dawn Garcia; Zhao Lai; Tim H. M. Huang; Victor X. Jin

doi:10.3791/54131

Genetics

Methyl-binding DNA capture Sequencing for Patient Tissues

Published: October 31, 2016 doi: 10.3791/54131

Rohit R. Jadhav¹, Yao V. Wang¹, Ya-Ting Hsu¹, Joseph Liu¹, Dawn Garcia², Zhao Lai², Tim H. M. Huang¹, Victor X. Jin¹

¹Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, ²Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio

Summary

Here we present a protocol to investigate genome wide DNA methylation in large scale clinical patient screening studies using the Methyl-Binding DNA Capture sequencing (MBDCap-seq or MBD-seq) technology and the subsequent bioinformatics analysis pipeline.

Abstract

Methylation is one of the essential epigenetic modifications to the DNA, which is responsible for the precise regulation of genes required for stable development and differentiation of different tissue types. Dysregulation of this process is often the hallmark of various diseases like cancer. Here, we outline one of the recent sequencing techniques, Methyl-Binding DNA Capture sequencing (MBDCap-seq), used to quantify methylation in various normal and disease tissues for large patient cohorts. We describe a detailed protocol of this affinity enrichment approach along with a bioinformatics pipeline to achieve optimal quantification. This technique has been used to sequence hundreds of patients across various cancer types as a part of the 1,000 methylome project (Cancer Methylome System).

Introduction

Epigenetic regulation of genes through DNA methylation is one of the essential mechanisms required to determine the cell fate by stable differentiation of different tissue types in the body¹. Dysregulation of this process has been known to cause various diseases including cancer².

This process mainly involves the addition of methyl groups on the cytosine residue in the CpG dinucleotides of DNA³. There are a few different techniques currently used to investigate this mechanism, each having their own advantages as outlined in many studies^2-8. Here we will discuss one of these techniques called Methyl-Binding DNA Capture sequencing (MBDCap-seq), where we use an affinity enrichment technique to identify methylated regions of the DNA. This technique builds upon the methyl-binding ability of the MBD2 protein to enrich for the genomic DNA fragments containing methylated CpG sites. We utilize a commercial methylated DNA enrichment kit for the isolation of these methylated regions. Our laboratory has screened hundreds of patient samples using this technique and here we provide a comprehensive optimized protocol, which can be used to investigate large patient cohorts.

As evident with any next-generation sequencing technology, MBDCap-seq also requires a specific bioinformatics approach in order to accurately quantify the levels of methylation across the samples. There have been many recent studies in an effort to optimize the normalization and analysis process of the sequencing data⁹^,¹⁰. In this protocol, we demonstrate one of these methods implementing a unique read recovery approach — LONUT — followed by linear normalization of each sample in order to enable unbiased comparisons across large number of patient samples.

Subscription Required. Please recommend JoVE to your librarian.

Protocol

All tissues are obtained following approval of the Institutional Review Board committee and when all participants consented to both molecular analyses and follow-up studies. The protocols are approved by the Human Studies Committee at University of Texas Health Science Center at San Antonio.

1. Methyl-binding DNA Capture (MBDCap)

Sample collection and DNA isolation
1. Collect bulk tumor or normal tissue samples from patient paraffin embedded tissue samples.
2. Use a commercial DNA mini kit to isolate genomic DNA from the paraffin embedded tissue samples according to manufacturer's protocol.
3. Use a methylated DNA enrichment kit with the procedure described here according to manufacturer's protocol.
Initial bead wash
Note: The beads are used to couple with MBD-Biotin protein, which detects DNA methylation on the genome. Beads are usually suspended in a stock solution. Use these steps to wash off this liquid.
1. Resuspend the stock of streptavidin beads (See Materials Table) by gently pipetting up and down to obtain a homogeneous suspension. Do not mix the beads by vortexing.
2. For one µg of input DNA, add10 µL of beads to a tube with 90 µL of 1x Bind/Wash Buffer. Rotate for 1 min. Do not mix by vortexing.
3. Place the tube(s) on a magnetic rack for 1 min.
4. Remove the liquid with a pipette and discard the liquid.
5. Add 250 µL of 1x Bind/Wash Buffer to the beads and rotate for 1 min.
6. Repeat steps 1.2.3 - 1.2.5 twice.
Coupling the MBD-Biotin protein to the beads
1. For 1.2 µg of input DNA, add 7 µL (3.5 µg) of MBD-Biotin protein to each tube.
2. Add 93 µL of 1x Bind/Wash Buffer to the protein to make a final volume of 100 µL.
3. Mix the beads-protein mixture on a rotating mixer at RT for 1 h.
Fragmented DNA (input)
1. Sonicate DNA by adding 1.2 µg of total genomic DNA in 96 µL of TE buffer and 24 µL of 5x Bind/wash buffer to make a 120 µL solution. Use 30 s power ON and 30 s power OFF during sonication, 20 - 25 times.
2. Run 1 µL of sonicated DNA solution using a bioanalyzer system with a commercial high sensitivity DNA kit to check the size of the fragments to be in the range of 150 - 350 bp according to manufacturer's protocol.
Wash the MBD-beads
1. Place the tube containing MBD-beads on a magnetic rack for 1 min.
2. Remove and discard the liquid with a pipette without touching the beads.
3. Resuspend the beads with 250 µL of 1x Bind/Wash Buffer.
4. Mix the beads on a rotating mixer at RT for 5 min.
5. Repeat steps 1.5.1 - 1.5.4 twice.
Fragmented DNA capture reaction (1 µg input DNA)
1. Transfer the DNA/Buffer mixture to the tube containing the MBD-beads.
2. Mix the MBD-beads with the DNA on a rotating mixer for 1 h at RT. Alternatively, mix O/N at 4 ^oC.
Removing non-captured DNA from the Beads
1. After mixing the DNA and MBD-beads, place the tube on the magnetic rack for 1 min to concentrate all of the beads on the inner wall of the tube.
2. Remove the supernatant liquid with a pipette and save it in a clean DNase-free microcentrifuge tube as nonmethylated DNA. Store this sample on ice.
3. Add 200 µL of 1x Bind/Wash Buffer to the beads to wash the beads.
4. Mix the beads on a rotating mixer for 3 min.
5. Place the tube on the magnetic rack for 1 min.
6. Remove the liquid with a pipette and save it in a microcentrifuge tube.
7. For capture reactions of ≤1 µg of input DNA, repeat steps 1.7.3 - 1.7.6 for 2 more wash fractions in total.
Single fraction elution
1. Mix low-salt buffer with high-salt buffer (1:1) to get elution buffer (1,000 mM NaCl).
2. Resuspend the beads in 200 µL of elution buffer (1,000 mM NaCl).
3. Incubate the beads on a rotating mixer for 3 min.
4. Place the tube on the magnetic rack for 1 min.
5. With the tube in place on the magnetic rack, remove the liquid with a pipette without touching the beads with the pipette tip, and save it in a clean DNase-free 1.5 mL microcentrifuge tube. Store this sample on ice.
6. Repeat steps 1.8.1 - 1.8.4 once, collecting the second sample in the same tube (total volume will be 400 µL). Store the pooled sample on ice.
Ethanol precipitation (DNA cleanup)
1. To each noncaptured, wash, and elution fraction from the previous steps, add 1 µL glycogen (20 µg/µL, included in kit), 1/10 sample volume of 3M sodium acetate, pH 5.2 (e.g., 40 µL per 400 µL of sample) and 2 sample volumes of 100% ethanol (e.g., 800 µL per 400 µL of sample). The total volume is 1,241 µL.
2. Mix well and incubate at -80 ^oC for at least 2 h.
3. Centrifuge the tube for 15 min at 11,363 x g at 4 ^oC.
4. Carefully discard the supernatant without disturbing the pellet.
5. Add 500 µL of cold 70% ethanol.
6. Centrifuge the tube for 5 min at 11,363 x g at 4 ^oC.
7. Carefully discard the supernatant without disturbing the pellet.
8. Repeat steps 1.9.6 - 1.9.7 once and remove any remaining residual supernatant.
9. Air-dry the pellet for ~5 min. Do not completely dry the pellet.
10. Resuspend the DNA pellet in 37.5 µL of DNase-free water or other appropriate volume of buffer.
11. Place the DNA on ice or store the DNA at -20 ^oC or below until further use.
12. Check that the total amount of DNA is over 20 ng, (e.g., use fluorometric quantitation system, see Materials Table).

2. Sequencing

Use the DNA fragments recovered from MBDCap procedure for sequencing. For each sample to be sequenced, a total amount of 20 - 40 ng DNA is required for library preparation.
For DNA-seq library preparation use a fully automated library construction system (See Materials Table) and follow the standard protocol supplied by the manufacturer. Select DNA fragments of size 200 - 400 bp for library preparation. The automation library preparation system allows standardizing the library preparation procedure and minimizing the variation across the samples due to manual preparation.
Use adaptor primers and dilute to 100x concentration. Of this, use 10 µL for each sample.
Following step 2.2, take out the reaction, and quantify the DNA-seq library using a fluorometric quantitation system (See Materials Table).
1. Set up PCR amplification procedure. The choice of PCR master mix is to improve the PCR efficiency of GC enriched region of genome. In a PCR tube, add 15 µL DNA-seq library, 25 µL PCR master mix (See Materials Table), 2 µL PCR primer mix (See Materials Table), and 8 µL H₂O.
Follow PCR recommendation from manufacturer of the PCR master mix, altering cycling times and temperatures for differing starting materials: 98 ^oC for 45 s, X cycles of 98 ^oC for 15 s, 65 ^oC for 30 s, 72 ^oC for 30 s, and then 72 ^oC for 1 min. Hold at 4 ^oC.
1. Perform enough cycles to end up with 2 - 10 nM of library DNA (8 - 10 cycles).
Clean up PCR reaction with 1:1 ratio of PCR purification beads according to manufacturer's protocol (See Materials Table).
Elute with 20 - 30 µL elution buffer.
Barcode each sample and pool 4 samples together into one lane of a flow cell. Use the standard 50 bp single read protocol for sequencing (See Materials Table).

3. Bioinformatics Analysis

Note: Further process the raw fastq files obtained from the sequencing to perform quality control and mapping the short DNA sequences (reads) to the genome.

Use Bowtie short read aligner, genome mapping tool on each sample fastq file to map the reads to the reference genome. Many read mapping tools are available to perform this step and can be used based on individual preference.
1. Allow up to 2 mismatches in the whole reads while mapping to the reference genome. Report up to 20 best alignments for each read. For example, use bowtie -v 2 --best -k 20 --chunkmbs 200 <ebwt><fastqFile><alignFile>.
Use LONUT⁹ to recover the multiply mapped reads to improve the detection of the enriched regions. For each read, at most one alignment would be recovered when it is close enough to any of the above called peaks. For example, use: perl lonut.pl -g hg19 -w 250 -p 99 <alignFile><outputPath>. LONUT will perform the following steps.
1. Split alignments into uniquely mapped reads and multiply mapped reads. Call peak on the uniquely mapped reads using peak caller BELT¹¹. The option -w 250 and -p 99 in the example command are for BELT.
2. Combine uniquely mapped reads with the recovered reads obtained from LONUT, and the combined reads, in BED format, will be used in further analysis. Calculate the number of combined reads for future normalization.
Bin the reads. Extract reads in region of interest using perl script: perl methyPipeline.pl <sampleInfoFile><refGeneFile> -up <upstream extending length> -dn<downstream extending length>. For each bed file listed in the sampleInfoFile, the perl script performs the below tasks.
Note: See supplementary coding file for the perl script.
1. Bin the reads into fixed bin size, e.g., 100 bp, for 24 chromosomes.
2. For each gene specified in refGeneFile, extract the bins around transcription start site (TSS), up to the extending length specified by option -up and -dn. For example, extract the data in the region that is up and downstream 4 kb of TSS. At bin size 100 bp, this extracted data have 81 bins, and the center bin corresponds to the TSS.
3. For the gene on the antisense strand, flip the extracted region from left to right so that upstream bins are always placed on the left with the perl script.
4. Further split the TSS ±4 kb region into three sub regions: Left, upstream 4 kb to upstream 2 kb; Middle, up and downstream 2 kb; Right, downstream 2 kb to 4 kb.
5. For each sample, divide the count of reads in each bin by the total number of combined mapped reads calculated in 3.3.3. The normalization will eliminate the sample specific read differences and allows to compare the reads across different samples.
Run the bash script (as indicated in the script file) to perform statistical test on the normalized reads obtained from the left region, to detect differential methylation between normal and case group in the regions described as in the flank region.
Note: See the supplementary coding file for the bash script. Based on the regions that are differentially methylated, there are seven combinations:
Whole region: differential methylation throughout the TSS ±4 kb region
MiddleLeft: differential methylation in both middle and left region
MiddleRight: differential methylation in both middle and right region
Flank: differential methylation in both left and right region
Left: differential methylation in left region only
Right: differential methylation in right region only
Middle: differential methylation in middle region only
Use the same bash script to visualize the differential methylation in a tornado plot. Plot the hyper and hypo methylated genes on a separate panel, and sort in the order of the region combination as described in 3.4, respectively.

Subscription Required. Please recommend JoVE to your librarian.

Representative Results

We have used MBDCap-seq to study DNA methylation alterations in a large number of patients from diverse cancer types including breast¹², endometrial¹³, prostate¹⁴, and liver cancers among others. Here we demonstrate some information from the breast cancer study published recently¹². In this instance, we used the whole genome sequencing approach to identify CpG islands that are differentially methylated in tumor with respect to normal across different genomic regions. The investigation revealed that out of the total investigated CpG islands located in gene promoters (n = 13,081), 19.5% showed differential methylation in tumors compared to normal. Similarly, out of 6,959 intragenic CpG islands, 55.2% showed differential methylation. Intergenic promoters showed 28.1% of 4,847 and for gene promoters without CpG islands, 1.8% of 5,454 investigated regions showed differential DNA methylation (Figure 1A). A visual representation (Figure 1B) shows representative examples of the regions discussed above. The heatmap clearly depicts the methylated regions across 77 breast tumors, 10 breast normal tissues, and 38 breast cancer cell lines. The quantification of the methylation as discussed in the bioinformatics protocol detailed in this manuscript, enables us to perform various statistical tests across these patient samples to identify significant methylation differences in the whole population or a subpopulation. This analysis approach provides us with testable targets to further investigate the epigenetic mechanisms in these patient samples. Once these targets are identified, the levels of methylation can be further quantified and validated using pyrosequencing. The MBDCap-seq provides a genomic resolution of about 100 - 200 bp for the targets. To obtain further resolution, individual CpG locations within these regions can be selected for pyrosequencing quantification. We use this approach to quantify and validate the methylation differences observed in the MBDCap-seq data at a greater resolution and for comparison between individual patient groups. Figure 2 shows the quantification of methylation in 2 endometrial cancer subgroups nonrecurrent (NR) compared to recurrent (R). The figure depicts the level of DNA methylation for each patient in the 2 groups at different CpG sites in the promoter CpG island of the identified target gene SFRP1.

Figure 1: DNA hypermethylation in breast cancer samples relative to normal breast tissue in promoter and non-promoter CpG islands. Methyl capture sequencing (MBD-seq) was used to generate DNA methylation profiles of the genomes of breast tumors (n = 77) and normal breast tissue (n = 10). (A) Pie charts demonstrate differential methylation in promoter, intragenic, and intergenic CGIs as well as non-CGI promoter regions. (B) Example loci showing promoter CGI, intragenic, intergenic, and non-CGI promoter regions. Dashed squares highlight regions corresponding to breast cancer hypermethylation. Please click here to view a larger version of this figure.

Figure 2: Quantification of DNA methylation using pyrosequencing of CpG sites in an identified target gene. DNA methylation of SFRP 5 promoter region in recurrent and non-recurrent endometrial carcinoma patients detected by pyrosequencing. Each site represents one CpG site (R: Recurrent, n=21 and NR: Non-Recurrent, n=71). The plots show the mean and error bars show standard error of the mean or SEM. Please click here to view a larger version of this figure.

Subscription Required. Please recommend JoVE to your librarian.

Discussion

The MBDCap-seq technique is an affinity enrichment approach³, considered as a cost effective alternative when investigating cohorts with a large number of patients¹⁵. The pipeline presented here describes a comprehensive approach from sample procurement to data analysis and interpretation. One of the most important steps is setting up a PCR amplification procedure to improve the PCR efficiency of the GC enriched regions in the genome as this is where DNA methylation occurs. Also, it is essential to ensure that after sequencing, each sample has at least more than 20 million uniquely mapped reads to the genome. This coverage is expected to provide sufficient enrichment or sequence depth to map the entire genome¹². If this coverage is not met during the first round of sequencing for a particular sample and more DNA for that sample from part one is available, another round of sequencing in part two can be run. The resulting reads can be merged with the reads from the first round to achieve the coverage. The overall experiments should be designed to include some biological controls (e.g., normal tissue) to account for bias based on factors like copy-number variations¹⁵.

Our bioinformatics approach to investigate the DNA methylation observed in the patient samples provides possible target genes that show differential methylation enrichment in core promoter, promoter shore regions and also various combination of these. Despite the fact that MBDCap-seq can at most reach the resolution down only up to a 100 bp, the distinction of these regions into core and shore as described in the protocol enables us to identify potential target regions for further investigation of the regulatory roles of differential methylation enrichments and conduct future mechanistic studies. Here we also describe a way to visualize these identified regions using tornado plots, which provides an overview of the differential enrichment patterns.

There are other techniques which are based on chemical conversion of unmethylated cytosine to uracil by sodium bisulfide through deamination, and provide a more comprehensive coverage of DNA methylation at a single nucleotide resolution. Usually after bisulfide conversion, the DNA is sequenced using pyrosequencing for target-based experiments or by whole genome shotgun bisulfide sequencing (BS-seq). These techniques address the limitation of the technique described here as MBDCap-seq can only achieve at the most a resolution of up to 100 bp. However, despite these approaches being more comprehensive, they are relatively expensive and can exponentially increase the cost as the depth of sequencing or the number of patients is increased. On the other hand, commonly used bisulfide microarray platforms are cost effective, but provide relatively low coverage with probes investigating regions close to gene promoters rather than the whole genome. MBDCap-seq however, provides a balance between these high-cost sequencing techniques and low-cost methylation arrays¹⁶. We are using this protocol to investigate the DNA methylation in a large number of patient cohorts, as a part of the Cancer Methylome System project¹⁰ and can be used in future studies involving large patient cohorts.

Subscription Required. Please recommend JoVE to your librarian.

Disclosures

This protocol is developed in the laboratories of Dr. Tim Huang and Dr. Victor Jin at the University of Texas Health Science Center at San Antonio.

Acknowledgments

The work is supported by CPRIT Research Training Award RP140105, as well as partially supported by US National Institutes of Health (NIH) grants R01 GM114142 and by William & Ella Owens Medical Research Foundation.

Materials

Name	Company	Catalog Number	Comments
Methylminer DNA enrichment Kit	Invitrogen	ME10025
Dynabeads M-280 Streptavidin	Invitrogen	112-05D
Bioruptor Plus Sonication Device	diagenode	B01020001
3 M sodium acetate, pH 5.2	Sigma	S7899	100 mL
SPRIworks Fragment Library System I	Beckman Coulter	A50100	Fully automated library construction system
Adapter Primers	Bioo Scientific	514104	PCR primer mix
Qubit	Invitrogen	Q32854	Fluorometric Quantitation System
PCR master mix	KAPA scientific	KK2621	PCR master mix
AMPure XP	Beckman Coulter	A63881	PCR Purification beads
EB Buffer	Qiagen	19086
HiSeq 2000 Sequencing System	Illumina