We describe the preparation of barcoded DNA libraries and subsequent hybridization-based exon capture for detection of key cancer-associated mutations in clinical tumor specimens by massively parallel “next generation” sequencing. Targeted exon sequencing offers the benefits of high throughput, low cost, and deep sequence coverage, thus yielding high sensitivity for detecting low frequency mutations.
Efforts to detect and investigate key oncogenic mutations have proven valuable to facilitate the appropriate treatment for cancer patients. The establishment of high-throughput, massively parallel “next-generation” sequencing has aided the discovery of many such mutations. To enhance the clinical and translational utility of this technology, platforms must be high-throughput, cost-effective, and compatible with formalin-fixed paraffin embedded (FFPE) tissue samples that may yield small amounts of degraded or damaged DNA. Here, we describe the preparation of barcoded and multiplexed DNA libraries followed by hybridization-based capture of targeted exons for the detection of cancer-associated mutations in fresh frozen and FFPE tumors by massively parallel sequencing. This method enables the identification of sequence mutations, copy number alterations, and select structural rearrangements involving all targeted genes. Targeted exon sequencing offers the benefits of high throughput, low cost, and deep sequence coverage, thus conferring high sensitivity for detecting low frequency mutations.
The identification of "driver" tumor genetic events in key oncogenes and tumor suppressor genes plays an essential role in the diagnosis and treatment of many cancers1. Large-scale research efforts utilizing massively parallel "next-generation" sequencing have enabled the identification of many such cancer-associated genes in recent years2. However, these sequencing platforms typically require large quantities of DNA isolated from fresh frozen tissues, thus posing a major limitation in characterizing and analyzing DNA mutations from preserved tissues, such as formalin-fixed paraffin embedded (FFPE) tumor samples. Improved efforts to efficiently and reliably characterize "actionable" genomic information from FFPE tumor samples will enable the retrospective analysis of previously-banked specimens and further encourage individualized approaches to cancer management.
Traditionally, molecular diagnostic laboratories have relied on time-consuming, low-throughput methodologies such as Sanger sequencing and real-time PCR for DNA mutation profiling. More recently, higher-throughput methods utilizing multiplexed PCR or mass spectrometric genotyping have been developed to investigate recurrent somatic mutations in key cancer genes3-5. These approaches, however, are limited in that only predesignated "hotspot" mutations are assayed, making them unsuitable for detecting inactivating mutations in tumor suppressor genes. Massively parallel sequencing offers several advantages over these strategies including the ability to interrogate entire exons for both common and rare mutations, the ability to reveal additional classes of genomic alterations such as copy number gains and losses, and greater detection sensitivity in heterogeneous samples6, 7. Whole genome sequencing represents the most comprehensive approach for mutation discovery, though it is relatively expensive and incurs large computational demands for data analysis and storage.
For clinical applications, where only a small fraction of the genome may be of clinical interest, two particular innovations in sequencing technology have been transformative. First, through hybridization-based exon capture, one can isolate DNA corresponding to key cancer-associated genes for targeted mutation profiling8, . Second, through ligation of molecular barcodes (i.e. DNA sequences 6-8 nucleotides in length), one can pool hundreds of samples per sequencing run and fully take advantage of the ever-increasing capacity of massively parallel sequencing instruments10. When combined, these innovations enable tumors to be profiled for lower cost and at higher throughput, with smaller computational requirements11. Further, by redistributing sequence coverage to only those genes most critical to the particular application, one can achieve greater sequencing depth for higher detection sensitivity for low allele frequency events.
Here we describe our IMPACT assay (Integrated Mutation Profiling of Actionable Cancer Targets), which utilizes exon capture on barcoded sequence library pools by hybridization using custom oligonucleotides to capture all protein-coding exons and select introns of 279 key cancer-associated genes (Table 1). This strategy enables the identification of mutations, indels, copy number alterations, and select structural rearrangements involving these 279 genes. Our method is compatible with DNA isolated from both fresh frozen and FFPE tissue as well as fine needle aspirates and other cytology specimens.
1. DNA and Reagent Preparation
Note: This protocol describes the simultaneous processing and analysis of 24 samples (e.g. 12 tumor/normal pairs) but can be adapted for smaller and larger batches. DNA samples may derive from FFPE or fresh frozen tissue, cytological specimens, or blood. Typically, both tumor and normal tissue from the same patient will be profiled together in order to distinguish somatic mutations from inherited polymorphisms. The protocol begins immediately following DNA extraction.
2. Library Preparation – End Repair Module
3. Post End Repair Clean-up
4. Library Preparation – dA-Tailing Module
5. Library Preparation – Adapter Ligation Module
6. Library Amplification
7. Roche Nimblegen SeqCAP EZ Library Hybridization, Wash, and Amplification
8. Illumina Hi-Seq Sequencing
9. Data Analysis
One pool of 24 barcoded sequence libraries (12 tumor-normal pairs) was captured using probes corresponding to all protein-coding exons of 279 cancer genes and sequenced as 2 x 75 bp reads on a single lane of a HiSeq 2000 flow cell. Tumor and normal libraries were pooled in a 2:1 ratio. Sample performance metrics for a pool of frozen tumor DNA samples are shown in Figure 1, including alignment rate, fragment size distribution, on-target capture specificity, and mean target coverage. Example somatic mutations, insertions, deletions, and copy number alterations are shown in Figures 2-4.
Figure 1. Representative figures of the sequence performance metrics. a) cluster density and alignment rate, b) mean target coverage, c) insert size distribution, and d) capture specificity. Click here to view larger image.
Figure 2. Integrative Genomics Viewer (IGV) image showing specificity of sequence coverage at exons of EGFR in a lung cancer (top) and matched normal tissue (bottom). Gray bars represent unique sequence reads. At right, a heterozygous T>G mutation is present in 24% of reads in the tumor and 0% of reads in the normal, indicating that it is a somatic L858R amino acid substitution. Click here to view larger image.
Figure 3. Examples of indels in a colorectal cancer tumor-normal pair: a) Somatic frameshift insertion in APC. b) Somatic frameshift deletion of 7 base pairs in TP53. Click here to view larger image.
Figure 4. Copy number alterations in a tumor-normal pair. Each data point represents a single exon from 279 target genes. Copy number gains and losses are inferred from increases and decreases in tumor sequence coverage. Click here to view larger image.
Table 1: Probe Design Features for Target Capture.
Total exons | 4,535 |
Total introns | 14 |
SNPs* | 1042 |
Total target territory | 879,966 basepairs |
Total probe territory | 1,400,415 basepairs |
Table 2. Reagents and Kits.
Name of Reagent/Material | Company | Catalog Number |
NEBNext End Repair Module | New England Biolabs | E6050L |
NEBNext dA-Tailing Module | New England Biolabs | E6053L |
NEBNext Quick Ligation Module | New England Biolabs | E6056L |
Agencourt AMPure XP | Beckman Coulter Genomics | |
NEXTflex PCR-Free Barcodes – 24 | Bioo Scientific | 514103 |
HiFi Library Amplification Kit | KAPA Biosystems | KK2612 |
COT Human DNA, Fluorometric Grade | Roche Diagnostics | 05 480 647 001 |
NimbleGen SeqCap EZ Hybridization and Wash kit | Roche NimbleGen | 05 634 261 001 |
SeqCap EZ Library Baits | Roche NimbleGen | |
QIAquick PCR Purification Kit | Qiagen | 28104 |
Qubit dsDNA Broad Range (BR) Assay Kit | Life Technologies | Q32850 |
Qubit dsDNA High Sensitivity (HS) Assay Kit | Life Technologies | Q32851 |
Agilent DNA HS Kit | Agilent Technologies | 5067-4626, 4627 |
Agilent 2100 Bioanalyzer | Agilent Technologies | |
Covaris E220 | Covaris | |
Magnetic Stand-96 | Ambion | AM10027 |
Illumina Hi-Seq 2000 | Illumina |
Table 3. Index Oligo Blockers & Primers Sequences.
TS-HE Universal Blocker | AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT |
TS-INV-HE Index 1 Blocker | CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 2 Blocker | CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 3 Blocker | CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 4 Blocker | CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 5 Blocker | CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 6 Blocker | CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 7 Blocker | CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 8 Blocker | CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 9 Blocker | CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 10 Blocker | CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 11 Blocker | CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 12 Blocker | CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 13 Blocker | CAAGCAGAAGACGGCATACGAGATTTGACTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 14 Blocker | CAAGCAGAAGACGGCATACGAGATGGAACTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 15 Blocker | CAAGCAGAAGACGGCATACGAGATTGACATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 16 Blocker | CAAGCAGAAGACGGCATACGAGATGGACGGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 18 Blocker | CAAGCAGAAGACGGCATACGAGATGCGGACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 19 Blocker | CAAGCAGAAGACGGCATACGAGATTTTCACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 20 Blocker | CAAGCAGAAGACGGCATACGAGATGGCCACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 21 Blocker | CAAGCAGAAGACGGCATACGAGATCGAAACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 22 Blocker | CAAGCAGAAGACGGCATACGAGATCGTACGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 23 Blocker | CAAGCAGAAGACGGCATACGAGATCCACTCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 25 Blocker | CAAGCAGAAGACGGCATACGAGATATCAGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
TS-INV-HE Index 27 Blocker | CAAGCAGAAGACGGCATACGAGATAGGAATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
Primer 1 | 5′-AATGATACGGCGACCACCGAGATCT-3′ |
Primer 2 | 5′-CAAGCAGAAGACGGCATACGAGAT-3′ |
Our IMPACT assay produces a high alignment rate, high on-target rate, high target coverage, and high sensitivity for detecting mutations, indels, and copy number alterations. We have demonstrated the capability of our IMPACT assay to sequence DNA from both fresh frozen and archived FFPE samples of low DNA input. By performing targeted exon sequencing of key cancer-associated genes, one can achieve very deep sequence coverage for the exons of these most critical genes thereby maximizing the ability to detect low frequency mutations. The use of barcoding and multiplexing enables higher throughput, lower cost per sample, and lower input amounts of DNA.
A benefit of the targeted approach is that it is feasible to optimize capture probes such that all exons are covered uniformly and sufficiently for mutation detection. As shown in Table 1, our design encompasses 4,535 exons in 279 genes. Following iterative improvements to our design, a typical sequencing experiment produces no more than 2% of exons at less than 20% of the median coverage for the entire sample. Thus, for a tumor sequenced to 500x sequence coverage (a conservative estimate as displayed in Figure 1), >98% of exons will be covered by >100x depth, guaranteeing high sensitivity for identifying low frequency mutations. This uniformity of coverage can be largely attributed to the flexibility of the Nimblegen SeqCap system, where probes may be synthesized to different lengths in regions of different nucleotide composition. We have also successfully used individually synthesized biotinylated oligonucleotides from Integrated DNA Technologies (IDT) to add new content to our panel and to boost the coverage of regions that are difficult to capture and/or sequence. By also capturing select introns of recurrently rearranged genes, we are able to identify structural rearrangements producing fusion genes even when only one fusion partner is captured.
This targeted sequencing approach does possess two main limitations in the ability to detect "driver" genetic alterations in tumors. First, only a fraction of the genome is interrogated. Targeted sequencing is, by nature, limited. As new cancer genes emerge from systematic genomics efforts, they may be rapidly added to a capture panel, but functionally important mutations may be missed that would be detected by whole genome or whole exome sequencing. Second, it is limited to DNA-based alterations. The detection of aberrantly expressed transcripts, novel splice isoforms, and gene fusions requires RNA-Seq or similar techniques 24. Epigenetic alterations such as chromatin modifications and DNA methylation may also play key roles in tumorigenesis and tumor progression. As sequencing technologies continue to improve, one can envision integrated strategies employing whole genome sequencing, RNA-Seq and complementary approaches to comprehensively characterize the genetic and epigenetic make-up of individual tumors on a routine basis 25 . As complete integrated strategies are currently cost- and computationally-prohibitive, targeted sequencing presents a practical alternative to capture the most salient genomic features for clinical and translational applications.
Our protocol assumes that tumors are sequenced alongside matched normal tissue taken from the same patient. The purpose of this normal control is to determine whether sequence variants detected in the tumor are inherited or somatically acquired. In the absence of a matched normal, one can infer that variants occurring at mutational hotspots (i.e. sites of recurrent somatic mutations in other cancers) are likely to be somatic. Further, analysis of copy number gains and losses can be performed with a non-genetically matched diploid normal sample. For all other novel genetic alterations, the simultaneous analysis of matched normal tissue greatly simplifies the interpretation of the resulting mutation data and is highly recommended.
The IMPACT protocol described here demonstrates consistency and high quality performance for frozen samples. We have observed occasional variability when working with FFPE samples, though. Depending on the age and quality of the FFPE sample, the DNA may be severely degraded or chemically-modified, and as a result may render sequencing and analysis difficult 26. That being said, we have found these experimental conditions to be reliable for the large majority of FFPE samples. We have also successfully applied this protocol to characterize other clinically relevant specimens such as biopsies, fine needle aspirates, and cytologic specimens. It is for this reason-the flexibility to accommodate different types of specimens of variable quality, quantity, and heterogeneity-that this method stands to impact the diagnosis and treatment of cancer patients in the clinical arena.
The authors have nothing to disclose.
We thank Dr. Agnes Viale and the MSKCC Genomics Core Laboratory for technical assistance. This protocol was developed with support from the Geoffrey Beene Cancer Research Center and the Farmer Family Foundation.
NEBNext End Repair Module | New England Biolabs | E6050L | |
NEBNext dA-Tailing Module | New England Biolabs | E6053L | |
NEBNext Quick Ligation Module | New England Biolabs | E6056L | |
Agencourt AMPure XP | Beckman Coulter Genomics | ||
NEXTflex PCR-Free Barcodes – 24 | Bioo Scientific | 514103 | |
HiFi Library Amplification Kit | KAPA Biosystems | KK2612 | |
COT Human DNA, Fluorometric Grade | Roche Diagnostics | 05 480 647 001 | |
NimbleGen SeqCap EZ Hybridization and Wash kit | Roche NimbleGen | 05 634 261 001 | |
SeqCap EZ Library Baits | Roche NimbleGen | ||
QIAquick PCR Purification Kit | Qiagen | 28104 | |
Qubit dsDNA Broad Range (BR) Assay Kit | Life Technologies | Q32850 | |
Qubit dsDNA High Sensitivity (HS) Assay Kit | Life Technologies | Q32851 | |
Agilent DNA HS Kit | Agilent Technologies | 5067-4626, 4627 | |
Agilent 2100 Bioanalyzer | Agilent Technologies | ||
Covaris E220 | Covaris | ||
Magnetic Stand-96 | Ambion | AM10027 | |
Illumina Hi-Seq 2000 | Illumina |