gDNA enrichment for NGS sequencing is an easy and powerful tool for the study of constitutional mutations. In this article, we present the procedure to analyse simply the complete sequence of 11 genes involved in DNA damage repair.
The widespread use of Next Generation Sequencing has opened up new avenues for cancer research and diagnosis. NGS will bring huge amounts of new data on cancer, and especially cancer genetics. Current knowledge and future discoveries will make it necessary to study a huge number of genes that could be involved in a genetic predisposition to cancer. In this regard, we developed a Nextera design to study 11 complete genes involved in DNA damage repair. This protocol was developed to safely study 11 genes (ATM, BARD1, BRCA1, BRCA2, BRIP1, CHEK2, PALB2, RAD50, RAD51C, RAD80, and TP53) from promoter to 3'-UTR in 24 patients simultaneously. This protocol, based on transposase technology and gDNA enrichment, gives a great advantage in terms of time for the genetic diagnosis thanks to sample multiplexing. This protocol can be safely used with blood gDNA.
In 2010, nearly 1.5 million people (essentially women) developed breast cancer worldwide. It is estimated that 5 to 10% of these cases were hereditary. Almost 20 years ago, BRCA1 and BRCA2 were identified as involved in hereditary breast and ovarian cancers1. Since about 15 years ago, BRCA1 and BRCA2 coding regions have been sequenced to determine the genetic predisposition to breast and ovary cancer. Alterations in BRCA1 and BRCA2 are detected in 10 to 20% of selected families2 suggesting that the analysis of these regions is not sufficient for effective screening. Recently, the analysis of non-coding sequences (promoter, introns, 3-’UTR) of BRCA1 and BRCA2 highlighted that new mutations/variations could be linked to a higher risk of breast cancer3-6.
BRCA1 and BRCA2 proteins are involved in Homologous Recombination Repair (HHR), which is completed by numerous partners7. While alterations in BRCA1 or BRCA2 induce defects in DNA repair, the other partners may also affect the risk of breast cancer. This hypothesis appears to have been validated since BRIP18 and PALB29 have a proven impact on cervical and breast cancer, respectively. In addition, two other “moderate-risk” breast cancer susceptibility genes, ATM and CHEK2, may also be studied routinely10.
Following on from these studies, we decided to develop a protocol to analyze 11 genes (ATM, BARD1, BRCA1, BRCA2, BRIP1, CHEK2, PALB2, RAD50, RAD51C, RAP80, and TP53) in 24 patients simultaneously using a very easy and relatively fast protocol based on transposase technology, with enrichment and sequencing on a medium throughput device. Thanks to this technique, we sequenced complete genes from the start of the promoter to the end of 3’-UTR, except for RAP80, for which an intronic region of 2,500 bp was not covered (Chr5: 176,381,588-176,390,180). This represents a total of about 1,000,300 bp studied with 2,734 probes. Usually, BRCA1 and BRCA2 exonic sequences are analyzed by Sanger sequencing, which needs 1.5 months for less than 20 patients. With the present protocol (Figure 1), in the same time, 11 complete genes for more than 75 patients could be analyzed.
1. Assessment of gDNA (genomic DNA) Yield
2. gDNA Enrichment: Day 1, Morning
3. gDNA Enrichment: Day 1, Afternoon
4. gDNA Enrichment: Day 2, Morning
5. gDNA Enrichment: Day 2, Afternoon
6. gDNA Enrichment: Day 3
7. Data Analysis
Table 1. PCR conditions
Program | Temperature | Time | Num. of repeats |
1 | 72 °C | 3 min | 1 |
98 °C | 30 sec | 1 | |
98 °C | 10 sec | 10 | |
60 °C | 30 sec | ||
72 °C | 30 sec | ||
72 °C | 5 min | 1 | |
10 °C | unlimited | ||
2 | 95 °C | 10 min | 1 |
93 °C | 1 min | 1 | |
91 °C | 1 min | 1 | |
89 °C | 1 min | 1 | |
87 °C | 1 min | 1 | |
85 °C | 1 min | 1 | |
83 °C | 1 min | 1 | |
81 °C | 1 min | 1 | |
79 °C | 1 min | 1 | |
77 °C | 1 min | 1 | |
75 °C | 1 min | 1 | |
71 °C | 1 min | 1 | |
69 °C | 1 min | 1 | |
67 °C | 1 min | 1 | |
65 °C | 1 min | 1 | |
63 °C | 1 min | 1 | |
61 °C | 1 min | 1 | |
59 °C | 1 min | 1 | |
58 °C | 1 min | 16-18 hr | |
3 | 98 °C | 30 sec | 1 |
98 °C | 10 sec | 10 | |
60 °C | 30 sec | ||
72 °C | 30 sec | ||
72 °C | 5 min | 1 | |
10 °C | unlimited | ||
4 | 95 °C | 10 min | 1 |
95 °C | 15 sec | 40 | |
60 °C | 1 min |
Sample QC Results
The ability of this method to determine sequences of target genes is based on the quality of the gDNA (Figure 2A) and the quality of the tagmentation step. If the tagmentation is not sufficient (Figure 2B, upper panel), the sequencing will not be satisfactory. As mentioned above, after the tagmentation purification, the gDNA should be tagmented into fragments from 150 bp to 1,000 bp with the majority of fragments around 300 bp (Figure 2B, lower panel).
At the end of library preparation, library quality is checked by using a Bioanalyser. The profile obtained should be approximately the same as after tagmentation but with a higher numbers of fragments (Figure 2C). If library quality is not identical to Figure 1C, sequencing should not be performed.
Once the sequencing is launched on the sequencing device, the clustering score is available after 9 cycles and should be between 800,000 and 1,100,000 clusters / mm² (Figure 3A). If this number is lower, the sequencing will not be satisfactory, especially in depth of read. If it is higher, the images will be blurred (Figure 3B), and no analysis will be done due to the impossibility to differentiate clusters.
At the end of the run, it is important to check the Quality score. By following the protocol detailed above, the quality score of the run will correspond to Figure 4. The quality of sequencing is represented by a Q-score (Q-30). To validate the experiment, at least around 75% of sequences (clusters) should have a Q-score superior or equal to 30.
Sequencing Results
This experiment was designed to study constitutional genetic abnormalities. In this case, the genetic abnormality is present at 50% in the genome. Because of this, the sequencing depth is not very important. Herein, we prepared and sequenced simultaneously 2 libraries of 12 patients for 11 complete genes. To obtain high sequencing depth, the number of multiplexed patients has to be decreased. As gDNA enrichment is based on capture by probes, the enrichment is not homogeneous (Figure 5). With our protocol, we generated about 6 Gb of reads, inducing a mean coverage of 150 reads for each base, with a minimum of 20 and a maximum of 330 reads. These read depths are in accordance with the use of NGS in clinical practice11. Despite the heterogeneity of read depths, probably due to gDNA enrichment by capture, and not the major rearrangement of genes, it is important to note that the Q-score was always superior to 30 (Figure 5), thus validating the sequencing.
Nevertheless, it is important to validate the technology by comparing the results obtained with those obtained with Sanger sequencing. In the 17 patients sequenced by both Sanger sequencing (coding sequences of BRCA1 and BRCA2) and transposase based technology, we detected the same 330 genetic variations (SNP and mutations). As an example, a point mutation in the BRCA1 gene (Figure 6 left) and a deletion of 4 bases in the BRCA2 gene (Figure 6 right) were detected by both Sanger and transposase-based methods. As BRCA1 is on the reverse strand of Chromosome 13, it is important to note that it is necessary to complement (but not invert) the sequence obtained, whereas for BRCA2 (on the forward strand), the obtained sequence does not need to be complemented. As indicated in Figure 5, NGS technology gives an estimated frequency of the genetic variation, whereas the Sanger method does not. Moreover, especially for indel variations, the interpretation is easier with NGS technology. As shown in Figure 6 right, indel variation sequences appear as a scrambled electropherogram that needs forward and reverse sequencing to decipher the inserted or deleted sequence. With NGS analysis, the inserted or deleted sequence is directly determined, thus reducing the risk of misinterpretation. Finally, transposase based technology allowed us to analyze a greater number of target genes, and we found many new genetic variation sequences that were not covered by Sanger sequencing. These results will be published elsewhere.
Figure 1. Schematic workflow of the procedure. Please click here to view a larger version of this figure.
Figure 2. Quality controls before and after library preparation. A. Spectrophotometric profile of gDNA that can be safely used for tagmentation. The DNA yield must be higher than 5 ng/µl, 260/280 and 260/230 ratios must be superior to 1.8 and 2, respectively. B. Fragment analyzer profiles of tagmented gDNA. Upper panel represents insufficiently tagmented gDNA. Lower panel shows a perfectly tagmented sample. C. Fragment analyzer profile of prepared library just before sequencing launching. Fragment size is the same as tagmented gDNA (B, lower panel) but with an amplified amount. Please click here to view a larger version of this figure.
Figure 3. Checking cluster generation. A. Screenshot of the sequencing device during the run. Cluster density should be between 800 and 1,000 K/mm². B. Different images correspond to low density (upper panel), high density (lower panel) and perfect density (middle panel). Please click here to view a larger version of this figure.
Figure 4. Checking the Q-score. At the end of the run, validated sequencing should have at least 75% of generated clusters with a Q-score superior to 30. Please click here to view a larger version of this figure.
Figure 5. Representing coverage data and their corresponding Q-score. The read depth is heterogeneous all along the covered gene. Nevertheless, the Q-score of covered regions is always superior to Q-30. Please click here to view a larger version of this figure.
Figure 6. Representative results obtained with Sanger sequencing and with transposase-based technology. All genetic variations observed with Sanger sequencing were detected with the transposase-based technology. Whereas point mutations are easy to interpret, indel alterations are sometimes quite difficult to study with the Sanger method. With transposase-based technology associated with a medium throughput device, indel alterations are simple to discover. Nevertheless, it is important to note that it is necessary to complement (but not invert) sequences obtained when the target gene is located on the minus strand (here the BRCA1 gene). Please click here to view a larger version of this figure.
The widespread use of NGS devices and technologies has provided new opportunities in the study of cancer and genetic disorders. In addition to whole genome sequencing or RNA sequencing, the analysis of a large amount of selected gDNA sequences in numerous patients simultaneously offers great prospects in diagnosis. Here, we developed a specific design (available on demand) using Nextera technology to study 11 complete genes in 24 patients simultaneously with a medium throughput sequencing device (Table of Materials/Equipment). This protocol allows quick generation of data that enables a rapid response to patients’ concerns with a low risk of error. As illustrated in Figure 6, all genetic variations detected with Sanger sequencing were also detected by using the transposase-based preparation kit. This method is reliable and easy to interpret, especially for complex indel alterations that are analyzed directly. Nevertheless, it is important to note that for genes located in the minus strand, it is necessary to complement (without reversing) nucleotides. The present work was carried out with a design for the analysis of 11 complete genes, but the protocol is the same whatever the design chosen. Transposase-based technology can also be used for library preparation from long-range PCR products12. Indeed, the algorithm (developed by the manufacturer) used for the design (manufacturer website tool, see Table of Materials/Equipment) is specifically dedicated for this protocol. In addition to transposase-based technology, two other mechanisms of DNA fragmentation before library preparation are also available: mechanical fragmentation and enzymatic fragmentation. Mechanical DNA fragmentation is reproducible but needs an ultrasonicator, and library preparation is more expensive and more time consuming. Enzymatic DNA fragmentation often induces problems for DNA capture due to restriction enzyme site locations, resulting in a lack of target sequence coverage. Nevertheless, each strategy for DNA fragmentation has advantages and disadvantages but seems to give quite similar results, at least for long range PCR product fragmentation13. The use of a new enzyme cocktail seemed to provide the same results as those obtained with mechanical DNA fragmentation14. Transposase-based technology needs high quality DNA (not applicable for DNA extracted from FFPE tissues). Moreover, the activity of transposase needs fragments of more than 300 bp, suggesting that fragments of interest must be longer than this length. This specificity explains the need for high-quality, un-fragmented DNA.
Up to now, genetic variations of BRCA1 and BRCA2 have been studied in familial breast and ovarian cancers. However, the involvement of BRCA genes, and especially BRCA2, is now suspected in other cancers, particularly in pancreas15, prostate16, and testis cancers17. The genetic alteration of PALB2, a partner of BRCA genes, is also associated with pancreatic cancer18. Moreover, it recently appeared that patients harboring BRCA mutations, and to a lesser extent PALB2 mutations, showed a better response of their pancreatic cancer to PARP inhibitors19. As BRCA1, BRCA2, and PALB2 are associated with a familial risk of cancer, and possibly associated with susceptibility to specific treatments, it appears important to explore the BRCA1 and BRCA2 partners in screening for familial cancer risk and in screening for cancer treatment response.
From these data, the analysis of DNA break repair-related genes seems to be important not only for the screening of susceptibility but also for a putative indicator of treatment response. Moreover, the complete gene analysis proposed with this protocol could cover all alterations that could be congenitally present (and present in cancer cells), such as mutations in the promoter, splicing sites or regulatory splicing sites located in introns. To date, alterations in non-coding sequences of genes have not been studied in depth, but the development of genome-wide association studies could highlight important functions of these sequences.
In conclusion, the transposase-based design developed in this paper is an interesting way to explore genetic abnormalities in cancer susceptibility genes involved in DNA damage repair. The possibility to sequence 11 complete genes for 24 patients simultaneously is an important advantage compared with the Sanger sequencing method, which is quite a difficult technique for screening.
The authors have nothing to disclose.
We thank the Ligue contre le Cancer de Côte d’Or and the Centre Georges-François Leclerc for their financial support. We thank Philip Bastable for the editing of the manuscript.
MiSeq | Illumina | SY-410-1001 | Sequencing/medium throughput device |
Nextera Enrichment kit | Illumina | FC-123-1208 | Transposase based technology |
300 cycle cartridge | Illumina | 15033624 | |
AMPure beads | Beckman Coulter | A63881 | Magnetic purification beads |
Magnetic stand | Alpaqua | A32782 | |
96-well plates | Life Technologies | 4306737 | |
MIDI 96-well plates | Biorad | AB0859 | |
Microseal A | Biorad | MSA-5001 | This seal is necessary only for PCR amplification. Other standard seals can be used throughout the experiment |
MiSeq | Illumina | Provided with the sequencing device Experiment Manager software |
|
Illumina | Internet adress: http://designstudio.illumina.com/NexteraRc/project/new> Manufacturer website tool |