DNA extraction from saliva can provide a readily available source of high molecular weight DNA, with little to no degradation/fragmentation. This protocol provides optimized parameters for saliva collection/storage and DNA extraction to be of sufficient quality and quantity for downstream DNA assays with high quality requirements.
The preferred source of DNA in human genetics research is blood, or cell lines derived from blood, as these sources yield large quantities of high quality DNA. However, DNA extraction from saliva can yield high quality DNA with little to no degradation/fragmentation that is suitable for a variety of DNA assays without the expense of a phlebotomist and can even be acquired through the mail. However, at present, no saliva DNA collection/extraction protocols for next generation sequencing have been presented in the literature. This protocol optimizes parameters of saliva collection/storage and DNA extraction to be of sufficient quality and quantity for DNA assays with the highest standards, including microarray genotyping and next generation sequencing.
Obtaining high quality DNA for human genetic studies is essential in the disease gene discovery process. Blood, though requiring an invasive procedure and also being more expensive than saliva collection, is favored for creating immortalized cell lines as an infinite source of DNA, or iPSCs for functional studies, and sometimes blood DNA is used when cell lines are not available. However, obtaining blood requires a trained phlebotomist and blood has a shorter half-life than saliva1. DNA from saliva is less expensive and easier to obtain, since it can be collected and sent through the mail without the need for a phlebotomist, thereby increasing potential subject pools well beyond the catchment area of hospitals and laboratories2. Study enrollment may be improved when subjects have the option of giving a saliva sample instead of blood3, 4. Concerns about the quantity and quality of DNA from saliva may have limited its widespread use despite numerous studies recent studies showing the suitability of whole saliva, with an average of 4.3 x 105 cells per milliliter, for DNA testing over the older buccal swabs methods that did not obtain significant amounts of saliva2, 3, 4, 5, 6. While a modest literature exists showing the suitability of whole saliva derived DNA for genotyping applications including microarray-based methods8, 9, 10, no studies have examined next generation sequencing (NGS). The goal for optimizing this whole saliva DNA extraction protocol was to maximize quantity and quality for genetics applications in a cost effective way that is easily implemented in laboratories with common reagents and consumables.
DNA extraction from saliva requires several procedures: 1) collection and storage, 2) cell lysis, 3) RNase treatment, 4) protein precipitation, 5) ethanol precipitation, 6) DNA rehydration. The DNA Stabilization Buffer solution, described previously2, functions adequately without alteration. No attempt to optimize the RNase treatment and DNA rehydration steps was made. For each remaining step, several variables that could affect yield were identified. Each variable was manipulated individually and improvement in yield and quality was assessed statistically. For variables that were shown to improve yield and/or DNA quality, the optimal values were included in the final protocol.
NOTE: Prior to providing saliva samples all subjects gave informed consent conforming to the guidelines for treatment of human subjects at Nationwide Children’s Hospital.
1. Saliva Collection and Storage
2. Initial Preparations and Cell Lysis
3. RNA Removal
4. Protein and Lipid Removal
5. Isolation and Purification of gDNA
6. Rehydration of gDNA
To determine optimal parameters for DNA extraction a series of paired DNA extractions was performed. A single saliva sample was split and each portion tested with one of two possible values for a given variable. At least eight replicates of each paired test were performed (e.g., a single saliva sample was aliquoted to test extraction both with and without initial 50 °C incubation). Optimization was based on four standard metrics: total DNA yield, the 260/280 value, the 260/230 value, and visual inspection of electrophoresed DNA to assess fragmentation. Not all possible combinations of the variables were assessed statistical interactions (N=169 combinations), opting instead to assess the marginal effect of each variable individually. Effects were tested using a multi-way repeated-measures ANOVA and estimated effects were derived from the equivalent regression equation. All significant effects are summarized in Table 1, shown as average change in yield (ng/µl per ml of saliva input) or DNA quality (260/280 and 260/230).
Cell lysis (step 2) was optimized by assessing: 1) the presence/absence of a 50 °C incubation (1 hr) prior to cell lysis to ensure that Proteinase K degradation and cell lysis mediated by the storage buffer went to completion, 2) presence/absence of a homogenization by vortexing step (medium speed, 15 sec), and 3) lysis solution incubation time (5 versus 30 min). The 30 min cell lysis incubation increased yield by an average of 3.5% (p<.01) but no other cell lysis variable had a significant effect on yield. Vortexing decreased the 260/280 ratio by a statistically significant (p<.001) but practically small 0.03.
Protein precipitation (step 4) is preceded by Proteinase K digestion to disrupt amino acid chains, improving protein precipitation efficiency and releasing captured DNA. The amount of Proteinase K was varied ten-fold. Centrifugation temperature was reduced from 20 °C to 4 °C. Increasing the amount of Proteinase K caused a statistically significant decrease in yield (8.7%) and also slightly improved both the 260/280 and 260/230 ratios.
Ethanol precipitation (step 5) was the last stage of the protocol examined. The amount of glycogen carrier (0, 8 µl) was varied, as was the total centrifugation time (5 vs. 30 min11). Only centrifugation time significantly affected yield, with an average increase of 290%. The longer spin also decreased the 260/280 ratio slightly (0.05). No significant effect of glycogen on yield was observed during the experiments; though the total quantity of DNA in these extractions was sufficiently large that glycogen would not typically be used. Despite the lack of effect in these samples, it is still recommend to use glycogen to minimize the risk of reduced yields whenever saliva input volume is lower than given here or if there is any other reason to believe yield will be low.
Visual inspection of the representative DNA samples (Figure 1) indicated that the extracted DNA was not greatly fragmented for any saliva DNA extraction procedure, but rather showed an appropriate high molecular weight band without the smearing indicative of degraded DNA. After RNase A digestion, the protocol produced an average 260/280 of 1.74.
Figure 1. Quality of saliva derived DNA. Four extraction procedures were applied to the same saliva collection. (A) Samples were electrophoresed on a 0.8% agarose gel (250 ng DNA). All variations of the saliva DNA extraction protocols result in high molecular weight (>20 kb) DNA, with no evidence of degradation. Lane: 1 DNA ladder, 2 & 3 Oragene prepIT L2P Protocol samples, 4 & 5 Gentra Puregene Body Fluids Protocol, 6 & 7 the optimized protocol without RNA removal step, 8 & 9 the optimized protocol with the RNA remove step. Lanes 2-7 are directly analogous protocols on the same saliva samples. Lanes 8 & 9 show that the RNA removal step does not introduce DNA degradation. (B) Samples were electrophoresed on a 2% agarose gel (150 ng DNA). A slight RNA peak is observable near the bottom of the gel in lanes 2 through 7 (conventions as above). Lanes 8 & 9 show the effectiveness of the RNA removal step.
The RNA Removal Step (step 3 with RNase A) is critical for accurate quantification of DNA. During testing, consistently high RNA content was observed, as determined by the ratio of double stranded DNA to RNA measured by a Qubit 2.0 Fluorometer. On average, nucleic acid content from samples without RNase A treatment consisted of 46.6% (±0.4) RNA. Samples that underwent the RNA Removal Step read as “<20 ng/ml”, which is the lowest possible reading for the Qubit’s RNA detection.
The DNA obtained through this optimized protocol was of sufficient quality for high throughput sequencing when the additional RNase A step was applied. To attain targeted resequencing data, a custom Agilent SureSelect Target Enrichment kit was applied to 24 samples, targeting 2.6 Mb of sequence. High throughput sequencing was conducted on 12 barcoded (indexed) samples per lane. Sequence reads were BWA-aligned to the hg19 reference genome12, then application of GATK13 base quality score recalibration, indel realignment, duplicate removal, SNP discovery and genotyping simultaneously across all 24 samples was performed using the best practice hard filtering parameter values14. All 24 samples yielded high quality NGS data (Table 2). Of reads that passed Illumina’s standard filters and had Q>20, 91.4% aligned to the sequence enrichment target regions, providing an average on-target coverage depth of >30x coverage at Q>100, well within the necessary limits for rare SNP discovery in each sample. The average strand balance was 49.9%. Comparing variant calls with Illumina microarray genotypes yielded a concordance of 98.9%.
Candidate Variable | ng/µl | 260/280 | 260/230 |
Vortex | n.s. | -0.03*** | n.s. |
x30 min Cell Lysis Incubation | 3.5%** | n.s. | n.s. |
Proteinase K x10 | -8.7%* | -0.05*** | -0.03*** |
30 min spin | 290.2%*** | -0.05*** | n.s. |
Glycogen | n.s. | n.s. | -0.37*** |
Table 1. Effect Size of Optimized Variable on Quantity/Quality Metrics. All effect sizes are in the units listed in the column header. p-values from ANOVA: *p<.05; **p<.01; ***p<.001; n.s. not significant
Quality Metric | Value |
Target length | 1,708 kb |
Target Covered > 30x | 85.22% |
SNPs in dbSNP | 92.55% |
Array agreement | 98.99% |
Table 2. Quality of High Throughput Sequence from Target Enrichment.
New Optimized Protocol | Item | Distributer | Catalog # | Purchasing Unit | Cost/Unit | Cost/Collection |
15 ml Centrifuge Tubes | Fisher | 12-565-268 | 500 Tubes | $233.50 | $3.2690 | |
Cell Lysis Solution | Qiagen | 158908 | 1 L | $401.00 | $3.2080 | |
Proteinase K | Sigma | P6556 | 1 g | $713.00 | $1.5686 | |
Protein Precipitation Solution | Qiagen | 158912 | 350 ml | $350.00 | $2.7200 | |
Isopropanol | Fisher | A416-4 | Case of 4 x 4 L | $486.71 | $0.2434 | |
Glycogen Solution (20 mg/ml) | EZ-BioResearch | S1003 | 1 ml | $51.00 | $0.8160 | |
70% Ethanol | Fisher | 04-355-305 | Case of 4 x 1 gal. | $123.03 | $0.0325 | |
Tris-EDTA (TE) | Fisher | BP2473-1 | 1 L | $68.67 | $0.0412 | |
NaCl | Fisher | AC194090010 | 1 kg | $34.65 | $0.000004 | |
Tris HCl | Fisher | BP1757-100 | 100 ml | $57.84 | $0.0116 | |
EDTA(0.5 M) Solution | Fisher | 03-500-506 | 100 ml | $33.60 | $0.0134 | |
Sodium Dodecyl Sulfate | Fisher | BP166-100 | 100 g | $59.95 | $0.0060 | |
Total Cost | $11.93 | |||||
Oragene | Item | Distributer | Catalog # | Purchasing Unit | Cost/Unit | Cost/Collection |
15 ml Centrifuge Tubes | Fisher | 12-565-268 | 500 Tubes | $233.50 | $0.9340 | |
100% Ethanol | Fisher | BP2818-100 | 100 ml | $34.29 | $1.6459 | |
70% Ethanol | Fisher | 04-355-305 | Case of 4 x 1 gal. | $123.03 | $0.0081 | |
Tris-EDTA (TE) | Fisher | BP2473-1 | 1 L | $68.67 | $0.0687 | |
1.5 ml tube | Genesee | 22-281A | 500 Tubes | $22.85 | $0.0457 | |
Oragene Collection KIT | Oragene | OG-500 | 1 | $25.00 | $25.00 | |
Total Cost | $27.70 | |||||
Puregene | Item | Distributer | Catalog # | Purchasing Unit | Cost/Unit | Cost/Collection |
15 ml Centrifuge Tubes | Fisher | 12-565-268 | 500 Tubes | $233.50 | $0.9340 | |
Cell Lysis Solution | Qiagen | 158908 | 1 L | $401.00 | $4.0100 | |
Proteinase K | Qiagen | 158918 | 650 ml | $73.10 | $7.8723 | |
Protein Precipitation Solution | Qiagen | 158912 | 350 ml | $350.00 | $4.0000 | |
Isopropanol | Fisher | A4164 | Case of 4 x 4 L | $486.71 | $0.3650 | |
Glycogen Solution | Qiagen | 158930 | 500 ml | $64.30 | $2.5720 | |
70% Ethanol | Fisher | 04-355-305 | Case of 4 x 1 gal. | $123.03 | $0.0975 | |
DNA Hydration Solution | Qiagen | 158914 | 100 ml | $69.80 | $0.1396 | |
NaCl | Fisher | AC19409-0010 | 1 kg | $34.65 | $0.000004 | |
Tris HCl | Fisher | BP1757-100 | 100 ml | $57.84 | $0.0116 | |
EDTA(0.5 M) Solution | Fisher | 03-500-506 | 100 ml | $33.60 | $0.0134 | |
Sodium Dodecyl Sulfate | Fisher | BP166-100 | 100 g | $59.95 | $0.0060 | |
Total Cost | $19.99 |
Table 3. Cost comparison of the optimized protocol to extract DNA from 2 ml of whole saliva with other commercially available protocols. All reagents and consumables required for DNA extraction have been assessed using standard list prices available on the internet as of September 30, 2013. Note that this protocol is written to extract DNA from 1.25 ml of whole saliva (i.e., 2.5 ml of saliva and buffer, see step 2.3) as this value represents half of the total volume in the saliva collection tube. For the cost comparison, 2 ml was chosen as this is the amount associated with a common commercially available kit (Oragene).
DNA Stabilization Buffer (250 ml) | ||
Component | Volume (ml) | [Final] |
1 M NaCl | 1.461 g | 0.1 M |
Tris HCl | 2.5 | 0.01 M |
0.5M EDTA | 5 | 0.01 M |
10% SDS | 12.5 | 0.014 M |
Proteinase K Solution (20 mg/ml) | 2.5 | 6.92×10-6 M |
ddH2O | 227.5 | |
Proteinase K Solution (20 mg/ml) | ||
Component | Volume (ml) | [Final] |
Proteinase K powder | 500 mg | 6.92×10-4 M |
ddH2O | 25 | |
70% Ethanol (500 ml) | ||
Component | Volume (ml) | [Final] |
Ethanol 95% | 368.5 | 12.63 M |
ddH2O | 131.5 |
Table 4. Recipes for Reagents.
The present procedure is an optimized DNA extraction protocol that has considerably improved yield of high molecular weight DNA compared to standard methods, without compromising DNA quality. The critical step with the biggest effect on yield the most was step 5.2, which includes a longer centrifugation step during ethanol precipitation than any published protocol reviewed here, except one that was not widely distributed11. No changes in DNA quality associated with this longer centrifugation were detected, indicating that most of the available DNA from whole saliva collection is not degraded and high molecular weight.
Whole saliva collection has limitations in sample quality such as the potential for foreign contaminants, which needs to be minimized at the collection stage, and the presence of excessive protein in the sample that can be a sign of an underlying infection. Large amounts of protein or foreign contaminants can remain in the final extracted DNA thereby making quantification inaccurate. If there are residual proteins or contaminants after rehydration (step 6) are suspected, a sample clean up can be performed by starting at the protein precipitation step with reagent volumes scaled to reflect the sample input volume. Another limitation of the protocol is the length of time required to perform the steps. Multiple samples can be run in parallel; however, it is recommended that no more than 24 parallel extractions be run simultaneously. This is particularly important for the protein precipitation step, where the sample must remain cold to ensure a tight pellet and running more than 24 samples may allow pellets time to re-dissolve.
The protocol presented here is the most cost effective method considered (see Table 3). While this protocol does use reagents from the Puregene extraction kit, a smaller volume than recommended in the Puregene protocol is used without compromising extraction yield and it is this reduction in reagents that drives the cost savings relative to the protocol. Note that the calculated cost for extraction uses the list price for each item and does not reflect any discounts. The cost per extraction with the optimized protocol can be reduced further with bulk orders or discounts through company sales representatives.
Evidence has also been provided that this protocol is suitable for use with next generation sequencing. The data obtained provides further evidence of the utility of saliva samples for human genetics research in many diseases where blood is not routinely available. While blood and cell line derived DNA continues to be the preferred source of genetic material for testing, whole saliva collection is a viable alternative when such sources are not available, when patient enrollment is affected by their collection or phlebotomy is not available or impractical.
The authors have nothing to disclose.
This work was funded by a National Institutes of Health R01 (DC009453 support to CWB).
15ml Centrifuge Tubes | Fisher | 12-565-268 |
Cell Lysis Solution | Qiagen | 158908 |
Proteinase K | Sigma | P6556 |
Protein Precipitation Solution | Qiagen | 158912 |
Isopropanol | Fisher | A416-4 |
Glycogen | EZ-BioResearch | S1003 |
70% Ethanol | Fisher | 04-355-305 |
Tris-EDTA (TE) | Fisher | BP2473-1 |
NaCl | Fisher | AC194090010 |
Tris HCl | Fisher | BP1757-100 |
EDTA(0.5M) Solution | Fisher | 03-500-506 |
Sodium Dodecyl Sulfate | Fisher | BP166-100 |
Equipment | ||
Name of Equipment | Distributor | Catalog# |
Analog Vortex Mixer | Fisher | 02-215-365 |
Centrifuge 5810R | Eppendorf | 5811 000.010 |