This manuscript describes a detailed standardized protocol of high-throughput 16S rRNA-amplicon sequencing. The protocol introduces an integrated, uniformed, feasible, and inexpensive protocol starting from fecal sample collection through data analyses. This protocol enables analysis of large numbers of samples with rigorous standards and several controls.
The human intestinal microbiome plays a central role in protecting cells from injury, in processing energy and nutrients, and in promoting immunity. Deviations from what is considered a healthy microbiota composition (dysbiosis) may impair vital functions leading to pathologic conditions. Recent and ongoing research efforts have been directed toward the characterization of associations between microbial composition and human health and disease.
Advances in high-throughput sequencing technologies enable characterization of the gut microbial composition. These methods include 16S rRNA-amplicon sequencing and shotgun sequencing. 16S rRNA-amplicon sequencing is used to profile taxonomical composition, while shotgun sequencing provides additional information about gene predictions and functional annotation. An advantage in using a targeted sequencing method of the 16S rRNA gene variable region is its substantially lower cost compared to shotgun sequencing. Sequence differences in the 16S rRNA gene are used as a microbial fingerprint to identify and quantify different taxa within an individual sample.
Major international efforts have enlisted standards for 16S rRNA-amplicon sequencing. However, several studies report a common source of variation caused by batch effect. To minimize this effect, uniformed protocols for sample collection, processing, and sequencing must be implemented. This protocol proposes the integration of broadly used protocols starting from fecal sample collection to data analyses. This protocol includes a column-free, direct-PCR approach that enables simultaneous handling and DNA extraction of large numbers of fecal samples, along with PCR amplification of the V4 region. In addition, the protocol describes the analysis pipeline and provides a script using the latest version of QIIME (QIIME 2 version 2017.7.0 and DADA2). This step-by-step protocol is aimed to guide those interested in initiating the use of 16S rRNA-amplicon sequencing in a robust, reproductive, easy to use, detailed way.
Concentrated efforts have been made to better understand microbiome diversity and abundance, as another aspect of capturing difference and similarities between individuals in healthy and pathological conditions. Age2,3, geography4, lifestyle5,6, and illness5 were shown to be associated with the composition of the gut microbiome, but many conditions and populations have not yet been fully characterized. Recently it has been reported that the microbiome can be modified for therapeutic applications7,8,9. Therefore, additional insight into the relationship between various physiological conditions and the microbial composition is the first step toward optimization of potential future modifications.
The traditional microbial culture methods are limited by low yields10,11, and are conceptualized as a binary state where a bacteria is either present in the gut or not. High-throughput DNA-based sequencing has revolutionized microbial ecology, enabling the capture of all members of the microbial community. However, sequence read length and quality remain significant barriers to accurate taxonomy assignment12. Furthermore, high-throughput based experiments may suffer from batch effects, where measurements are affected by non-biological or non-scientific variables13. In recent years, several programs have been established to study the human microbiome, including the American Gut project, the United States (US) Human Microbiome Project, and the United Kingdom (UK) MetaHIT project. These initiatives have generated vast amounts of data that are not easily comparable due to a lack of consistency in their approaches. A variety of international projects such as the International Human Microbiome Consortium, the International Human Microbiome Standards project, and the National Institute of Standards and Technology (NIST) attempted to address some of these issues14, and developed standards for microbiome measurements which should enable the achievement of reliable reproductive results. Described here is an integrated protocol of several broadly used methods15,16 for 16S rRNA high-throughput sequencing (16S-seq) starting from fecal sample collection thru data analyses. The protocol describes a column-free PCR approach, originally designed for direct extraction of plant DNA16, to enable the simultaneous handling of large numbers of fecal samples in a relatively short time with high quality amplified DNA for targeted sequencing of the microbial variable V4 region on a common sequencing platform. This protocol aims to guide scientists interested in initiating the use of 16S rRNA-amplicon sequencing in a robust, reproductive, easy to use, detailed way, using important controls. Having a guided and detailed step-by step protocol may minimize batch effect and thus will allow more comparable sequencing results between labs.
Ethical approval for the study was granted by the Sheba Local Research Ethics Committee and all methods were performed in accordance with the relevant guidelines and regulations. The protocol received a patient consent exception from the local Ethical Review Board, since the fecal material that were used were already submitted to the microbiology core as part of clinical workup and without identifiable patient information other than age, gender, and microbial results. Written, informed consent was obtained from healthy volunteers and the Institutional Review Board approved the study. Some of those samples have already been included in a previous analysis1.
1. Sample Handling
2. DNA Extraction
3. PCR and Library Preparation
For steps 3.1 and 3.2, work in a PCR workstation that provides clean, template and amplicon free environment.
4. Library Quantification and Cleaning
5. Sequencing
6. Data Processing
A schematic illustration of the protocol is shown in Figure 1.
We have prospectively collected stool samples from hospitalized patients with suspected infectious diarrhea. Those samples were submitted to the Clinical Microbiology Lab at the Sheba Medical Center between February and May 2015, as was previously described1. Stool samples were subjected to conventional microbiological culture performed at the Clinical Microbiology Lab and to broad range high-throughput 16S-seq in parallel. In addition, stool samples from healthy adults were sequenced for comparison.
In this analysis, the focus was on the quality control of the data, and show representative results of the output obtained from QIIME217. Initially, sequencing results achieved in 6 negative control samples were reviewed for quality control including: i) PCR without template, ii) PCR on clean and sterile swabs in extraction and dilution solution, and iii) PCR on mixed extraction and dilution solutions. Table 2 shows that all negative control samples had ≤118 total reads (average of 29 sequences), primarily from the Mycoplasma taxa, while all other samples analyzed showed mean total reads of 10622.57 (±1211.34 standard error) and no Mycoplasma read passed the quality control of the main samples.
Sixteen samples that each originated from the same stool sample but went through different library preparation with different reverse barcoded primers were used as positive controls. Those 16 samples included 8 with positive cultures for Campylobacter, Salmonella, or Shigella that were reported by the clinical microbiology lab, and 8 that were reported as having negative cultures. An area plot at the phyla level is shown in Figure 2. Samples that originated from the same stool sample but went through different library preparation with different reverse barcoded primers show very consistent relative abundance (Mantel test between duplicates of phylum level taxonomic relative abundance values simulated p-value = 1 x 10-04 based on 9999 replicates).
Principal coordinates analysis (PCoA), which reduces dimensionality of the input data, was used on the UniFrac matrix to visually explore sample separation and similarity. In Figure 3A, sequenced samples that originated from the same original stool sample but went through different library preparation are colored the same. Indeed, those are located relatively closely in the PCoA plot (Mantel test between duplicates PC1 PC2 values simulated p-value = 1e-04 based on 9999 replicates). In addition, the average unweighted unifrac beta diversity distance between samples in this study is 0.706, while the average distance between two duplicates is 0.235 (Wilcoxon rank sum test p-value = 4.205 x 10-11). Interestingly, samples that were culture positive for Campylobacter, Salmonella, or Shigella enteropathogens showed significantly different PC1 values (Wilcoxon rank sum test p-value = 0.011) in comparison to samples with negative culture results. Figure 3B shows the same PCoA plot but now samples are colored by the relative abundance of Proteobacteria. Samples with high Proteobacteria abundance had significantly different PC1 values (Spearman correlation r = -0.533, p-value = 0.000975). Those results are consistent with previous analysis of this data, where hospitalized patients have profound increases in taxa from Proteobacteria phylum1. Here we showed that PC1 is correlated with Proteobacteria abundance, with culture positive samples clustering mostly on the one side of the PC1 axis and culture negative samples clustering mostly on the other side, together with the healthy samples.
Figure 1: Flow chart from fecal samples to relative microbial abundance. 1) Sample handling: Left, fecal samples are collected in a sterile swab tube. Right, the swab stick size is adjusted by cutting it to enable the closure of 2ml tube, and storage. 2) DNA extraction: DNA is extracted by direct PCR columns-free approach. 3) Library preparation: Left, 4 μL of extracted DNA is amplified with Forward/Reverse-index primers. Each reverse primer contains a 12 base barcode. Each sample is amplified in triplicate. A 96-well plate contains 32 different barcodes. Right, combine each barcode triplicate into a single volume tube. A 96-well plate contains 96 different barcodes. 4) Library quantification: Upper panel, screening for positive samples, amplicons were detected on Ethidium Bromide-stained 1% agarose gel. Lower panel, quantification of the positive amplicons to reach equimolar concentration of 500 ng amplicon from each sample into a single library pool. 5) Library purification: Upper panel, the pooled library is gel extracted for size selection and purification. Lower panel, the library exact size and concentration is measured. 6) Sequencing: high-throughput sequencing of the pooled library. 7) The sequencing data is analyzed by a bioinformatic pipeline for 16S rRNA-amplicon microbial ecology. Please click here to view a larger version of this figure.
Figure 2: Consistent relative abundance between samples that originated from the same stool sample but went through different library preparation. Taxonomic relative abundance at the phylum level is shown per sample as indicated. Relative abundance is shown for 16 fecal samples obtained from hospitalized patients with suspected infectious diarrhea that each went through different library preparation (Duplicate 1 and 2) with different reverse barcoded primers and of 3 healthy controls. Please click here to view a larger version of this figure.
Figure 3: Phylogenetic diversity show consistent results for samples that originated from the same stool sample but went through different library preparation. Principal coordinates analysis (PCoA) that reduces dimensionality of the input data was used on the UniFrac matrix to visually explore sample separation and similarity. The distance between two samples represents the difference in their microbiome composition. (A) Unweighted UniFrac PCoA plot. Each point represents a single sequenced sample. Samples originated from the same original stool sample are colored the same. Samples from hospitalized patients with positive stool culture reported by the clinical microbiology1 are marked by squares, hospitalized patients with culture negative in triangles1, and healthy adults from a separate sequencing run1 are marked with filled in circles. (B) Same PCoA as in A, but samples are now colored by their relative abundance of Proteobacteria, as indicated. Please click here to view a larger version of this figure.
Primer name | Primer sequence | Primer description |
Forward primer | AATGATACGGCGACCACCGAGATCTACAC – TATGGTAATT – GT – GTGCCAGCMGCCGCGGTAA | 5' adapter sequence – Forward primer pad – Forward primer linker – Forward primer |
Reverse index primer | CAAGCAGAAGACGGCATACGAGAT – XXXXXXXXXXXX – AGTCAGTCAG – CC – GACTACHVGGGTWTCTAAT | Reverse complement of 3' adapter sequence - Golay barcode – Reverse primer pad – Reverse primer linker - Reverse primer |
Read 1 sequencing primer | TATGGTAATT – GT – GTGCCAGCMGCCGCGGTAA | Forward primer pad - Forward primer linker - Forward primer |
Read 2 sequencing primer | AGTCAGTCAG – CC – GGACTACHVGGGTWTCTAAT | Reverse primer pad - Reverse primer linker - Reverse primer |
Index sequence primer | ATTAGAWACCCBDGTAGTCC – GG – CTGACTGACT | Reverse complement of reverse primer - Reverse complement of reverse primer linker - Reverse complement of reverse primer pad |
Table 1: Primer list.
Sample | Number of reads post initial quality control in QIIME |
PCR on clean swabs in extraction and dilution solution -1 | 118 |
PCR on clean swabs in extraction and dilution solution -2 | 39 |
PCR without template -1 | 0 |
PCR without template -2 | 0 |
PCR on extraction and dilution solutions mix – 1 | 16 |
PCR on extraction and dilution solutions mix – 2 | 0 |
Samples (mean ± standard error) | 10622.57 ± 1211.34 |
Table 2: Number of reads for negative controls and for fecal samples.
Supplementary Material 1: Data processing script. Please click here to download this file.
Supplementary Material 2: Mapping file. Please click here to download this file.
Supplementary Material 3: Primer scheme. Please click here to download this file.
16S rRNA-amplicon and metagenomics shotgun sequencing have gained popularity in clinical microbiology applications21,22,23. These techniques are advantageous in their increased ability to capture culturable and non-culturable taxa, providing data about the relative abundance of the pathogenic inoculum, and their ability to identify more precisely a polymicrobial infectious fingerprint24. The advances in the field of microbiome research have generated vast amounts of data that are not easily comparable due to a lack of consistency in the various approaches. In recent years, several consortiums have attempted to address some of these issues. This protocol follows similar library preparation as the earth microbiome project (EMP). However, added is a detailed step-by-step protocol for DNA extraction from fecal samples through analyses pipeline.
Previously, 16S rRNA sequencing has been used to characterize the microbial composition patterns of fecal samples from healthy non-hospitalized subjects and from hospitalized patients with suspected infectious diarrhea, and to traditional culture results. The results from that analysis showed that hospitalized patients have profound increases in taxa from Proteobacteria phylum1. Described here is an integrated protocol of broadly used methods for 16S rRNA gene amplicon sequencing using the direct PCR columns-free approach that was originally designed for extraction of plant DNA16. This approach can efficiently and quickly extract good-quality amplified-DNA libraries targeting the V4 variable region of the 16S rRNA gene from a large number of samples suitable for high-throughput 16S rRNA sequencing.
As this is a DNA-based sequencing method, data on both aerobes and anaerobes microbial communities present in an individual fecal sample are obtained. The protocol adopted the primers that were originally designed15 against the V4 region of the 16S rRNA gene. Importantly, the reverse amplification primer contain a twelve base barcode sequence that supports pooling of up to 2,167 different samples in each lane15. The forward amplification primer includes nine extra bases in the adapter region that support paired-end sequencing on the sequencing platform25 (Table 1 and Supplementary Material 3). This approach enabled handling a library composed of 250 fecal samples that were sequenced on one lane with a running depth of 10622.57 ± 1211.34 (mean ± standard error) reads per sample at a cost of ~30 dollars per sample.
To ensure quality control, we suggest using the following negative controls; i) PCR without DNA template, ii) PCR on DNA extracted from a clean sterile swab, and iii) PCR on mixed extraction and dilution solution. As for positive controls, we recommend including samples that originated from the same stool sample but went through different library preparation with different reverse barcoded primers and samples from previous runs as internal quality control. Of note, other optional positive controls to be used are those microbiome measurement standards provided by NIST. The average reads coverage for the negative controls is remarkably lower in comparison to the actual stool samples (Table 1) and was composed primarily from the Mycoplasma taxa. We further showed the consistency of the sequencing results obtained using samples that each originated from the same fecal material but that went through different library preparation (Figure 3). This result also confirms that, when using our integrated method, there is no cross-contamination between samples.
In this protocol we introduce an integrated uniformed, feasible protocol to analyze large numbers of samples. This protocol aims to guide scientists interested in initiating the use of 16S rRNA-amplicon sequencing in a robust, reproductive, easy to use, inexpensive, detailed way from fecal collection to data analyses, using important controls. Using this standardized detailed guided protocol, may minimize batch effects and allow more comparable sequencing results between different labs.
The authors have nothing to disclose.
This work was supported in part by the I-CORE program (grants No. 41/11), the Israel Science Foundation (grant No. 908/15), and the European Crohn's and Colitis Organization (ECCO).
Primers | Integrated DNA Technologies (IDT) | ||
Extraction solution | Sigma-Aldrich | E7526 | |
Dilution solution | Sigma-Aldrich | D5688 | |
Kapa HiFi HotStart ReadyMix PCR Kit | KAPABIOSYSTEMS | KK2601 | PCR Master mix |
Quant-iT PicoGreen dsDNA Reagent kit | Invitrogen | P7589 | dsDNA quantify reagent |
MinElute Gel extraction kit | Qiagen | 28606 | |
Agarose | Amresco | 0710-250G | |
Ultra Pure Water Dnase and Rnase Free | Biological Industries | 01-866-1A | |
Qubit dsDNA HS assay kit | Molecular probes | Q32854 | dsDNA detecting kit |
High Sensitivity D1000 | Agilent Technologies | Screen Tape 5067-5582 | separation and analysis |
Screen Tape Assay | Agilent Technologies | Reagents 5067-5583 | for DNA libraries |
PhiX Control v3 | Illumina | 15017666 | control library |
MiSeq Reagent Kit v2 (500 cycle) | Illumina | MS-102-2003 | |
Ethidium Bromide | Amresco | E406-10mL-TAM | |
2 mL collection tubes | SARSTEDT | 72.695.400 | Safe Seal collection tubes |
Plastic stick swab in PP test tube | STERILE INTERIOR | 23117 | |
Name | Company | Catalog Number | Comments |
Equipment | |||
PCR Machine | Applied Biosystems | 2720 Thermal Cycler | |
Sequncing Machine | Illumina | Miseq | |
PCR workstation | Biosan | UV-cleaner | |
scissors | |||
vortexer | Scientific Industries | Vortex-Genie 2 |