Microbiota Analysis Using Two-step PCR and Next-generation 16S rRNA Gene Sequencing

Shailesh  K. Shahi; Kasra Zarei; Natalya  V. Guseva; Ashutosh K. Mangalam

doi:10.3791/59980

Biology

Microbiota Analysis Using Two-step PCR and Next-generation 16S rRNA Gene Sequencing

Published: October 15, 2019 doi: 10.3791/59980

Shailesh K. Shahi¹, Kasra Zarei², Natalya V. Guseva¹, Ashutosh K. Mangalam^1,2,3,4

¹Department of Pathology, University of Iowa, ²Medical Scientist Training Program, University of Iowa, ³Graduate Program in Immunology, University of Iowa, ⁴Graduate Program in Molecular Medicine, University of Iowa

Summary

Described here is a simplified standard operating procedure for microbiome profiling using 16S rRNA metagenomic sequencing and analysis using freely available tools. This protocol will help researchers who are new to the microbiome field as well as those requiring updates on methods to achieve bacterial profiling at a higher resolution.

Abstract

The human gut is colonized by trillions of bacteria that support physiologic functions such as food metabolism, energy harvesting, and regulation of the immune system. Perturbation of the healthy gut microbiome has been suggested to play a role in the development of inflammatory diseases, including multiple sclerosis (MS). Environmental and genetic factors can influence the composition of the microbiome; therefore, identification of microbial communities linked with a disease phenotype has become the first step towards defining the microbiome’s role in health and disease. Use of 16S rRNA metagenomic sequencing for profiling bacterial community has helped in advancing microbiome research. Despite its wide use, there is no uniform protocol for 16S rRNA-based taxonomic profiling analysis. Another limitation is the low resolution of taxonomic assignment due to technical difficulties such as smaller sequencing reads, as well as use of only forward (R1) reads in the final analysis due to low quality of reverse (R2) reads. There is need for a simplified method with high resolution to characterize bacterial diversity in a given biospecimen. Advancements in sequencing technology with the ability to sequence longer reads at high resolution have helped to overcome some of these challenges. Present sequencing technology combined with a publicly available metagenomic analysis pipeline such as R-based Divisive Amplicon Denoising Algorithm-2 (DADA2) has helped advance microbial profiling at high resolution, as DADA2 can assign sequence at the genus and species levels. Described here is a guide for performing bacterial profiling using two-step amplification of the V3-V4 region of the 16S rRNA gene, followed by analysis using freely available analysis tools (i.e., DADA2, Phyloseq, and METAGENassist). It is believed that this simple and complete workflow will serve as an excellent tool for researchers interested in performing microbiome profiling studies.

Introduction

Microbiota refers to a collection of microorganisms (bacteria, viruses, archaea, bacteriophages, and fungi) living in a particular environment, and the microbiome refers to the collective genome of resident microorganisms. As bacteria are one of the most abundant microbes in humans and mice, this study is focused only on bacterial profiling. The human gut is colonized by trillions of bacteria and hundreds of bacterial strains¹. The normal gut microbiota plays a vital role in maintaining a healthy state in the host by regulating functions (i.e., maintenance of an intact intestinal barrier, food metabolism, energy homeostasis, inhibition of colonization by pathogenic organisms, and regulation of immune responses)²^,³^,⁴^,⁵. Compositional perturbations of the gut microbiota (gut dysbiosis) have been linked to a number of human diseases, including gastrointestinal disorders⁶, obesity⁷^,⁸, stroke⁹, cancer¹⁰, diabetes⁸^,¹¹, rheumatoid arthritis¹², allergies¹³, and central nervous system-related diseases such as multiple sclerosis (MS)¹⁴^,¹⁵ and Alzheimer's disease (AD)⁸^,¹⁶. Therefore, in recent years, there has been growing interest in tools for identifying bacterial composition at different body sites. A reliable method should have characteristics such as being high-throughput and easy-to-use, having the ability to classify bacterial microbiota with high resolution, and being low-cost.

Culture-based microbiological techniques are not sensitive enough to identify and characterize the complex gut microbiome due to the failure of several gut bacteria to grow in culture. The advent of the sequencing-based technology, especially 16S rRNA-based metagenomic sequencing, has overcome some of these challenges and transformed microbiome research¹⁷. Advanced 16S rRNA-based sequencing technology has helped in establishing a critical role for the gut microbiome in human health. The Human Microbiome Project, a National Institutes of Health initiative¹⁸, and the MetaHIT project (a European initiative)¹⁹ have both helped in establishing a basic framework for microbiome analysis. These initiatives helped kick-start multiple studies to determine the role of the gut microbiome in human health and disease.

A number of groups have shown gut dysbiosis in patients with inflammatory diseases¹²^,¹⁴^,¹⁵^,²⁰^,²¹^,²². Despite being widely used for taxonomic profiling due to the ability to multiplex and low costs, there are no uniform protocols for 16S rRNA-based taxonomic profiling. Another limitation is the low resolution of taxonomic assignment owing to smaller sequencing reads (150 bp or 250 bp) and use of only forward sequencing read (R1) due to low quality reverse sequencing reads (R2). However, advances in sequencing technology have helped to overcome some of these challenges, such as the ability to sequence longer reads using paired-end reads (e.g., Illumina MiSeq 2x300bp).

The present sequencing technology can sequence 600 bp good quality reads, which allows merging of R1 and R2 reads. These merged longer R1 and R2 reads allow better taxonomic assignments, especially with open-access R-based Divisive Amplicon Denoising Algorithm-2 (DADA2) platform. DADA2 utilizes amplicon sequence variant (ASV)-based assignments instead of operational taxonomic unit (OTU) assignments based on 97% similarity utilized by QIIME²³. ASV matches result in an exact sequence match in the database within 1–2 nucleotides, which leads to assignment at genus and species levels. Thus, the combination of longer, good quality paired-end reads and better taxonomic assignment tools (such as DADA2) have transformed microbiome studies.

Provided here is a step-by-step guide for performing bacterial profiling using two-step amplification of the V3–V4 region of 16S rRNA and data analysis using DADA2, Phyloseq, and METAGENassist pipelines. For this study, human leukocyte antigen (HLA) class II transgenic mice are used, as certain HLA class II alleles are linked with a predisposition to autoimmune diseases such as MS²⁰^,²⁴^,²⁵. However, the importance of HLA class II genes in regulating the composition of gut microbiota is unknown. It is hypothesized that the HLA class II molecule will influence gut microbial community by selecting for specific bacteria. Major histocompatibility complex (MHC) class II knockout mice (AE.KO) or mice expressing human HLA-DQ8 molecules (HLA-DQ8)²⁴^,²⁵^,²⁶ were used in order to understand the importance of HLA class II molecules in shaping the gut microbial community. It is believed that this complete and simplified workflow with R-based data analysis will serve as an excellent tool for researchers interested in performing microbiome profiling studies.

The generation of mice lacking endogenous murine MHC class II genes (AE.KO) and AE^-/-.HLA-DQA1*0103, DQB1*0302 (HLA-DQ8) transgenic mice with a C57BL/6J background has been described previously²⁶. Fecal samples are collected from mice of both sexes (8–12 weeks of age). Mice were previously bred and maintained in the University of Iowa animal facility as per the NIH and institutional guidelines. Contamination control strategies such as weaning of the mice inside a laminar flow cabinet, changing of gloves between different strains of mice, and proper maintenance of mice are critical steps for profiling of gut microbiome.

Proper personal protective equipment (PPE) are highly recommended during the entire procedure. Appropriate negative controls should be included when performing DNA isolation, PCR1 and PCR2 amplification, and sequencing steps. Use of sterile, DNase-free, RNase-free, and pyrogen-free supplies is recommended. Designated pipettor for microbiome work and filtered pipette tips should be used throughout the protocol. Microbiota analysis consists of seven steps: 1) fecal sample collection and processing; 2) extraction of DNA; 3) 16S rRNA gene amplification; 4) DNA library construction using indexed PCR; 5) clean-up and quantification of indexed PCR (library); 6) MiSeq sequencing; and 7) data processing and sequence analysis. A schematic diagram of all protocol steps is shown in Figure 1.

Subscription Required. Please recommend JoVE to your librarian.

Protocol

The protocol was approved by the Institutional Animal Care and Use Committee of the University of Iowa.

1. Fecal Sample Collection and Handling

Sterilize the divider boxes (see Table of Materials, Supplementary Figure 1) with 70% ethanol.
Pre-label microcentrifuge tubes (one per mouse) with the sample ID and treatment group (if applicable).
Place the mice in sterilized divider boxes and allow them to defecate normally for up to 1 h.
Collect the fecal pellets in an empty, pre-labeled 1.5 mL microcentrifuge tube using sterile forceps and close the tube securely. Sterilize the forceps after collecting from each mouse.
Place the microcentrifuge tube containing fecal pellets in a -80 °C freezer until further processing.
NOTE: The divider boxes are advantageous because they allow simultaneous collection of fecal samples from multiple mice (up to 12) at one time.

2. Extraction of DNA

Remove the fecal samples (mouse or human) from the freezer and thaw at room temperature (RT).
NOTE: It is advisable to thaw human stool samples overnight at 4 °C as needed to collect 200 mg or the required amount from stock samples.
Use 200 mg of starting materials and elute DNA to a final volume of 50 µL.
Include a DNA isolation kit blank in which no fecal sample is added but is processed through all DNA extraction steps.
NOTE: A specific DNA isolation kit (see Table of Materials) was used, as it contains specific reagents to remove inhibiting materials such as biosolids, undigested plant material, and heme compounds from lysed red blood cells present in human and mouse stool samples.
Place the bead tube into a homogenizer (see Table of Materials) and homogenize the samples for 45 s at RT and a speed of 4.5 m/s.
NOTE: Bead-beating homogenizer from any manufacturer can be used. However, it is recommended to standardize the method, specifically the speed and duration of homogenization, when using bead-beating homogenizer from another vendor.
Isolate DNA from individual mouse fecal samples using a bacterial DNA isolation kit following the manufacturer’s protocol (see Table of Materials) with minor modifications. Quantify the isolated DNA by loading 1 μL of the DNA on a fluorimeter or on the high sensitivity electrophoresis chip (see Table of Materials).
NOTE: The expected yield of DNA can range from 500–2,500 ng when starting with 200 mg of the fecal sample.
Adjust the concentration of DNA to 4–20 ng/μL using elution buffer. Requantify the DNA (as done in step 2.5) before proceeding to 16S rRNA gene amplification (PCR1), if PCR1 is not performed on the same day.

3. 16S rRNA Gene Amplification (PCR1)

Set up 16S rRNA gene amplification (PCR1) in a 96 well PCR plate using a 25 μL reaction volume.
Using a multichannel pipette, add 12.5 μL of 2x high-fidelity polymerase enzyme mix including buffer, in addition to dNTPs (see Table of Materials): 40 ng of DNA in up to 10.5 μL total volume (adjust the total volume with PCR grade water): 1 μL (each) of forward and reverse primers at 1 μM concentration.
NOTE: Sequences of the primers are as follows:
forward primer = 5'-TCGTCGGCAGCGTCA GATGTGTATAAGAGA CAGCCTACGGGNGGCWGCAG-3' reverse primer = 5'-GTCTCGTGGGCTCGGAGATGTGTA TAAGAGACAGGACTACHVGGGTATCTAATC C-3'. Include a kit blank from step 2.3 (kit reagent control) in the PCR plate.
Seal the PCR plate and centrifuge at 1,000 x g at 20 °C for 1 min in a tabletop plate centrifuge (see Table of Materials) and perform PCR in a thermal cycler programmed for: 95 °C for 3 min; 25 cycles of 1) 95 °C for 30 s, 2) 55 °C for 30 s, 3) 72 °C for 30 s; final extension at 72 °C for 5 min; and hold at 4 °C.
NOTE: Although this 16S rRNA gene amplification method should work with different types of biospecimens, it is advised to standardize the number of amplification cycles when starting a new project.
Confirm the size of PCR1 product by loading 1 μL of the DNA on a high sensitivity electrophoresis chip. Alternatively, run 5 µL of 16S rRNA-amplified product on a 1.5% agarose gel to confirm 550 bp of the PCR1 product.
NOTE: Clean-up of 16S rRNA amplified product is optional and depends on the DNA isolation kit/method being used. If using an in-house DNA isolation method, PCR1 product can be cleaned utilizing a microbiome DNA purification kit as per the manufacturer’s protocol (see Table of Materials). The DNA isolation kit used here yields an ultrapure DNA and does not require clean-up of the PCR1 product.

4. DNA Library Construction Using Indexed PCR (PCR2)

Place the Index 1 and Index 2 barcoded primers in a special rack (see Table of Materials) for 96 libraries.
1. Arrange Index 1 primer in columns 1–12 and Index 2 primer in rows A–H of the special rack.
2. Add 2.5 μL of Index 1 in columns 1–12 and Index 2 primer in rows A–H using multichannel pipettes. Place the new cap (see Table of Materials) on Index 1 and Index 2 adopter primers and store it in a -20 °C freezer.
Using a multichannel pipette, add 12.5 μL of 2x high fidelity polymerase enzyme mix containing a buffer, in addition to dNTPs (see Table of Materials); 5 μL of PCR grade water and 2.5 μL of 16S rRNA-amplified product.
NOTE: Add unique indices to each sample for multiplexing of more than 96 libraries in a single run, as described in Kiernan et al.²⁷. The present protocol uses adapters from a commercial kit (e.g., Nextera XT Index Kit) as per the manufacturer’s instruction provided in the 16S metagenomics sequencing method (e.g., Illumina).
Seal and centrifuge the Indexed PCR plate at 1,000 x g at 20 °C for 1 min and perform PCR in a thermal cycler programmed for: 95 °C for 3 min; 8 cycles of 1) 95 °C for 30 s, 2) 55 °C for 30 s, 3) 72 °C for 30 s; and final extension at 72 °C for 5 min.
Confirm the 630 base size of the indexed PCR product by loading 1 μL of DNA on a high sensitivity electrophoresis chip. Alternatively, run 5 µL of indexed PCR product on a 1.5% agarose gel to confirm the size and intensity of the product.

5. Clean-up of Indexed PCR (PCR2) and Quantification

Pool 5 μL of PCR2 amplicon using multichannel pipettes from each well into a multichannel reservoir tray free of detectable DNase, RNase, human DNA, and Pyrogenic bacteria (see Table of Materials).
Transfer the pooled product from the multichannel reservoir tray into an empty, pre-labeled 1.5 mL microcentrifuge tube and vortex to mix.
Purify PCR2 product using standard magnetic beads kit (see Table of Materials) as per the manufacturer’s instruction. Seal and store the remaining 20 μL of PCR2 in the same plate at -80 °C for further use, if needed.
Prepare fresh 80% ethanol by adding 4 mL of 100% ethanol to 1 mL of PCR grade water.
Equilibrate the magnetic bead to RT and vortex for 30 s to disperse the beads evenly.
Briefly vortex and spin down the pooled PCR2 amplicon samples.
Add 80 μL of magnetic beads into a pre-labeled, sterile 1.5 mL microcentrifuge tube with 80 μL of pooled PCR products, then vortex and spin down briefly to evenly resuspend the magnetic beads.
Incubate the contents at RT without disturbing the tubes for 15 min.
Place the tube with the DNA and magnetic beads on a magnetic stand (see Table of Materials) for 5 min.
Carefully remove and discard 150 μL of the supernatant.
Add 200 μL of freshly prepared 80% ethanol without disturbing the beads and incubate for 30 s.
Carefully remove, and discard all the supernatant.
Repeat steps 5.11–5.12. Remove the remaining volume with a P10 pipette. Allow the beads to air-dry for 15 min, with the index PCR tube remaining on the magnetic stand.
Remove from the magnetic stand. Add 33 μL of elution buffer (elution buffer of DNA kit is acceptable). Vortex well, and perform a quick spin to remove any remaining liquid on the side. Incubate for 2 min and place on the magnetic stand for 5 min.
Transfer 30 μL of the supernatant (clean PCR products) to a pre-labeled 1.5 mL microcentrifuge tube.
Quantify the purified pool by loading 1 μL of the purified pool on a fluorimeter or high sensitivity electrophoresis chip, as this will be required during sequencing. Perform MiSeq as detailed below.

6. MiSeq Sequencing

Create a sample sheet containing sample-specific barcode information for metagenomics workflow and demultiplexing on the MiSeq instrument (see Table of Materials). Upload this sample sheet to the software (e.g., Illumina Experiment Manager).
Dilute the pooled libraries from step 5.15 to 4 nM.
Denature pooled libraries by combining 5 μL of the 4 nM library pool with 5 μL of freshly prepared 0.2 M NaOH in a 1.5 mL microcentrifuge tube. Vortex briefly to mix, centrifuge briefly and incubate for 5 min at ambient RT.
Add 990 μL of ice-cold hybridization buffer (HB buffer) and pipette gently to mix. This will yield a 20 pM library.
Combine 2 μL of the 10 nM control library with 3 μL of EBT buffer (10 mM Tris-HCl, pH = 8.5, with 0.1% Tween 20) to yield a 4 nM control library. Add 5 µL of freshly prepared 0.2 N NaOH and vortex briefly to mix. Incubate for 5 min at RT.
Add 990 μL ice-cold hybridization buffer (HB buffer) and pipette gently to mix. This will yield 20 pM control libraries.
NOTE: Denatured 20 pM control libraries can be stored at -20 °C up to for 4 weeks. After 4 weeks, cluster numbers tend to decrease.
1. Combine 210 µL of the 20 pM library with 40 µL of the 20 pM control library (final concentration = ~18%), and add 350 µL of HB buffer. Load the library at a final concentration of 7 pM.
  NOTE: Input details should be adjusted as per run performance.
Incubate samples for 2 min at 96 °C. Put on ice for 5 min. Load 600 µL of the final pool into the appropriate well of the MiSeq cartridges.
NOTE: Section 6 above can be performed at a genomic/DNA core facility.

7. Data Processing and Sequence Analysis

Use R software (version 3.5) for DADA2 data processing and analyses. For steps 7.1–7.4, use the open-access software as outlined in the previously developed DADA2 online tutorial found at <https://benjjneb.github.io/dada2/tutorial.html>.
NOTE: A readily usable R script has been attached as a Supplemental File 2, and users must change the name and source of sequencing files (e.g., SAMPLENAME_R1_001.fastq and SAMPLENAME_R2_001.fastq).
Visualize the quality profiles of the forward and reverse reads using the plotQualityProfile command.
Trim nucleotides from forward and reverse reads based on the quality plot. These parameters are specified by the truncLen parameter in DADA2.
NOTE: Here, 280 is used as the length threshold for which the forward reads would be discarded, and 260 is used as the length threshold for which the reverse reads would be discarded.
Process the raw 16S data as fastq files by the DADA2 pipeline as outlined in the online tutorial (found at <https://benjjneb.github.io/dada2/tutorial.html>) to merge R1 and R2 reads and form amplicon sequence variants (ASVs), which are then used to assign taxa with the Silva reference database²³.
NOTE: A sample amplicon sequence variant table generated from the DADA2 pipeline is included as Supplemental File 3. Either a Greengene or Silva reference database can be used, as no differences in bacterial classification were found using either of these databases.
Generate a user-defined mapping file that contains the metadata (i.e., genotype, gender, treatment, etc.) for each sample. A sample metadata file has been included as Supplemental File 4.
Calculate alpha diversity (Shannon index) and beta diversity using principal coordinates analysis (PCA) based on the rarefied OTU counts using Phyloseq²⁸ as outlined in the online tutorial, found at <https://benjjneb.github.io/dada2/tutorial.html>.
Perform the following analysis in METAGENassist²⁹.
NOTE: Perform the differential abundance analysis using the Wilcoxon rank-sum test at the genus level. Heat maps and differentially abundant taxa are highlighted using METAGENassist²⁹, a publicly available and web-based analysis pipeline.
1. Upload the taxonomic abundance table (CSV format) and select Samples in the column.
2. Upload the mapping file (CSV format) and select Samples in a row.
3. Select Options to remove variables with over 50% zeroes, and exclude unassigned and unmapped reads.
4. Select Options to normalize rows by sum and log normalize columns.
5. Make a Volcano, PCA, or PLSDA plot by clicking the same in the left-hand column and click Remove samples name to make the graph.
6. Perform a t-test (if only two groups) or ANOVA (if greater than two groups) to visualize the features (bacteria) that differ among groups. Click Selected features to visualize specific bacteria that differ between groups.
7. Click Dendrogram or Heat map to create respective plots. Additional analysis, such as sample visualization by groups or t-test/ANOVA-based top 25 features, can be performed.
8. Click RandomForest to create graphs showing features that can be used for classification. Click the Variance tab on top to create graphs for the top features that differ between/among groups. Click Feature details to see a list of bacteria that differ among groups and click each bacterium to create a graphical summary of the same.
9. Click the Outlier tab on top to visualize samples that are outliers.
10. Finally, click Download and select either 1) a zip file containing all the analysis performed or 2) the desired features to download. This file should be saved as a unique name and will need to be unzipped before use.

NOTE: For detailed statistical tests performed during microbiome analysis, refer to the works of Chen et al. and Hugerth et al.¹⁴^,³⁰.

Subscription Required. Please recommend JoVE to your librarian.

Representative Results

As MHC class II molecules (HLA in humans) are central players in the adaptive immune response and show strong associations with a predisposition to MS²⁴^,²⁵^,²⁶, it was hypothesized that the HLA class II molecule would influence gut microbial composition. Therefore, mice lacking the MHC class II gene (AE.KO) or expressing human HLA-DQ8 gene (HLA-DQ8) were utilized to understand the importance of HLA class II molecules in shaping the gut microbial community.

Fecal samples were collected from AE.KO (n = 16) and HLA-DQ8 (n = 12) transgenic mice, bacterial DNA was extracted, and the V3-V4 region of the 16S rRNA gene was amplified. The amplicon size (550 bp) was confirmed by running the samples on a 1.5% agarose gel (Figure 2A, lanes 1–6). Further confirmation of 16S rRNA amplicon size (550 bp) was performed by loading 1 μL of the PCR1 product on a high sensitivity electrophoresis chip (Figure 2B, lanes 1–7).

An electropherogram was generated from 16S rRNA PCR product, which showed peak regions with fragments sized ~550 bp (Figure 2C). Dual indices and sequencing adapters were attached using indexed PCR (PCR2) that assigned a unique identity to each sample and allowed multiplexing of many samples in a single MiSeq sequencing run. Confirmation of indexed PCR was performed by agarose gel electrophoresis (Figure 2A, lanes 7–12) and a high sensitivity electrophoresis chip (Figure 2B, lanes 8–12), Figure 2D]. All the samples from PCR2 were pooled, purified, and loaded onto a next-generation sequencer that yielded forward R1 and reverse R2 reads of good quality (Figure 3). The median obtained reads after quality filtering and trimming were 88,125 (range of 9,597–111,848).

Community ecology analysis was performed using the DADA2 analysis pipeline and visualized with Phyloseq and METAGENassist to demonstrate differences in alpha diversity (Figure 4) and beta diversity (Figure 5), as well as differences at the genus and species levels between groups. DADA2 analysis generated an abundance table with comma-separated-values in csv format, which was used for further downstream analysis using a web-based platform (i.e., Phyloseq and/or METAGENassist). The alpha and beta diversity analyses were performed based on user-defined categories listed in a mapping file.

Shannon diversity analysis revealed an overall lower alpha diversity for AE.KO mice compared to HLA-DQ8 transgenic mice (Figure 3). Ordination with principal coordinates analysis showed a distinct spatial clustering between AE.KO mice and HLA-DQ8 transgenic mice (Figure 5). An abundance table from DADA2 was also used to perform a comprehensive metagenomic analysis using open-access software METAGENassist²⁹. Heat map-based clustering of bacterial abundance (genus level) (Figure 6A) and a box plot for specific bacteria showing the differences between two groups (Figure 6B) were generated utilizing a METAGENassist pipeline.

Heat map analysis showed that certain bacteria such as Allobacullum, Desufovibrio, and Rikenella were more abundant in HLA-DQ8 transgenic mice. In contrast, Biolophila was more abundant in AE.KO mice (Figure 6B). Relative abundances of individual bacteria (Bilophila and Rikenella) are shown in a representative box plot (Figure 6B). Altogether, the data demonstrate that AE.KO mice possess a distinct microbial community compared to that of HLA-DQ8 transgenic mice, with an absence of specific bacteria in AE.KO mice. The data also suggest that MHC class II molecules play an important role in the abundance of certain bacteria. In summary, this simple and detailed protocol will help researchers who are new to the microbiome field as well as those who need updates on the methods for achieving higher taxonomic resolution.

Figure 1: Flow diagram of gut microbiome sequencing. All steps of the microbiome sequencing (sample collection to microbiome data analysis) are displayed. Please click here to view a larger version of this figure.

Figure 2: 16S rRNA gene amplification and quality control analysis of V3–V4 region. (A) Representative agarose gel electrophoresis image of the 16S amplicon (PCR1, size 550 bp) and indexed PCR (PCR2, size = 630 bp) from AE.KO mice (lanes 1–3 and 7–9) and HLA-DQ8 transgenic mice (lanes 4–6 and 10–12). (B) Representative gel image of the 16S amplicon (PCR1, size = 550 bp) and indexed PCR (PCR2, size = 630 bp) of AE.KO mice (lanes 1–3 and 8–10) and HLA-DQ8 transgenic mice (lanes 4–7 and 11–12) resolved by electrophoresis. (C) Representative electropherogram of the 16S amplicon (PCR1) showed a peak region containing fragments that were sized ~550 bp. (D) Representative electropherogram of indexed PCR (PCR2) showed a peak region comprising of fragments sized ~630 bp. Please click here to view a larger version of this figure.

Figure 3: Quality profile of forward reads (R1, top) for two representative samples and corresponding reverse reads (R2, bottom) for the same samples. The analysis was performed using a DADA2 pipeline, in which the x-axis shows read length (0–300 bases) and y-axis shows quality of the reads. Green line represents the median quality score, whereas the orange line represents quartiles of the quality score distribution at each base position. Forward reads (R1) always showed better quality than reverse reads (R2). Please click here to view a larger version of this figure.

Figure 4: Alpha diversity measures (Shannon diversity) of AE.KO mice and HLA-DQ8 transgenic mice. Each dot represents α-diversity (Shannon diversity) in a sample from a single mouse. Shannon diversity was overall lower for AE.KO mice compared to HLA-DQ8 transgenic mice. Please click here to view a larger version of this figure.

Figure 5: Ordination with partial least squares-based dimension analysis plot. The plot shows a clear separation between AE.KO mice and HLA-DQ8 transgenic mice. Each dot represents bacterial composition within a sample, and dotted eclipses indicate 80% confidence intervals. The PLS-DA plots were generated using METAGENassist. Please click here to view a larger version of this figure.

Figure 6: AE.KO mice showing distinct microbial community compared to HLA-DQ8 transgenic mice, with an absence of specific bacteria in AE.KO mice. (A) Heat map combined with agglomerative hierarchical clustering showing the relative abundance of bacteria (genus level). (B) Box plot showing a normalized relative abundance of two representative bacteria (Bilophila and Rikenella) in AE.KO and HLA-DQ8 transgenic mice. Both plots were generated using METAGENassist. Please click here to view a larger version of this figure.

Supplementary Figure 1: Representative picture of the divider box for the collection of fecal samples from multiple mice at a time. Please click here to view a larger version of this figure.

Supplementary File 2: DADA2 R-script for generating abundance table from raw sequences. Please click here to download this file.

Supplementary Figure 3
Supplementary File 3: Representative genus abundance table (CSV format). Please click here to download this file.

Supplementary Figure 4
Supplementary File 4: Representative mapping file (CSV format). Please click here to download this file.

Subscription Required. Please recommend JoVE to your librarian.

Discussion

The described protocol is simple, with easy-to-follow steps to perform microbiome profiling using 16S rRNA metagenomic sequencing from a large number of biospecimens of interest. Next-generation sequencing has transformed microbial ecology studies, especially in human and pre-clinical disease models³¹^,³². The main advantage of this technique is its ability to successfully analyze complex microbial compositions (culturable and non-culturable microbes) in a given biospecimen at a high throughput level and at a low cost³². However, several factors (i.e., batch effects, selection of primers for 16S rRNA gene, and sequence data analysis) remain a major obstacle in the widespread use of this technology.

Advanced 16S rRNA-based MiSeq sequencing technologies (2x300bp)³³ allow sequencing of ~600 nucleotides out of the 1,500 nucleotide-long 16S rRNA gene of bacteria, which overcame the earlier bottleneck of short sequencing reads. Primers specific for a different region of rRNA such as V1–V2, V3–V5, or V6–V9 can be used with each region-specific primers showing some bias for over- or under-detection of particular taxa³⁴^,³⁵. Some groups prefer the V1–V2 region²¹, which shows an increased bias for Clostridium but underdetection for certain Bacteroidetes species. In contrast, other groups prefer the V4, V3–V4, and V3–V5 regions, which demonstrate the least biased classification of bacterial taxa¹⁵^,²⁰^,³⁶^,³⁷.

The present protocol uses primers specific for V3–V4 regions of the 16S rRNA gene, as it covers the longer region of 16S rRNA with two hypervariable regions (V3 and V4) compared to V4 alone. Additionally, V3–V4 specific amplicon allows merging of both forward (R1) and reverse (R2) reads, leading to better resolution of bacterial classification. Although the V3–V5 region provides longer reads and covers more hypervariable regions (V3, V4, and V5), it is challenging to merge R1 and R2 reads due to no/little overlapping regions between R1 and R2 reads. Therefore, a number of studies have used data only from R1 reads for bacterial classification when performing 16S rRNA metagenomic sequencing using V3–V5 region¹⁴.

Proper biospecimen storage and handling are critical for microbiome analysis to prevent degradation of bacterial DNA or environmental contamination³⁸. Long-term storage of biospecimen at RT or 4 °C can lead to overgrowth of certain bacteria or fungi. Samples can be either frozen immediately or transported at 4 °C (for 1–3 days), then frozen. Biospecimen can also be stored directly in preservatives such as nucleic acid stabilization solution (e.g., RNA-later), 95% ethanol, or a commercial storage kit. The general consensus is that either of these storage methods does not cause significant differences in bacterial community profiles³⁹^,⁴⁰. Although a solution with preservatives allows for storage and transportation at RT, these samples cannot be used for RNA-based assays, metabolite analyses, or fecal transplant experiments. These issues have been discussed in detail elsewhere³⁹^,⁴⁰.

One of the critical steps in this protocol is use of a bead-beating-based method for the mechanical disruption of gram-positive bacteria and Archaea. An earlier study showed the highest bacterial diversity using the bead-beating-based method compared to other methods of cell lysis⁴¹. An incomplete bacterial lysis or contamination during DNA extraction can introduce bias in gut microbiome data⁴². Another important point to consider is contamination due to laboratory reagents included in kits called the kitome⁴³^,⁴⁴^,⁴⁵. Samples with large biomass and rich bacterial diversity such as soil or feces show less of these problems compared to samples with lower biomass such as skin. Therefore, a water extraction control should be included with each extraction and processed with other samples to identify the introduction of potential contamination due to DNA extraction⁴³^,⁴⁴^,⁴⁵.

In microbiome analysis, diversity within a sample can be measured by alpha diversity, a measure of the richness of species, or beta diversity, which estimates dissimilarities between samples/groups. Popular methods for measuring α-diversity include UniFrac (weighted and unweighted) coupled with multivariate statistical techniques such as principal coordinate analysis (PCoA) or Brey-Curtis dissimilarity. While UniFrac is based on phylogenetic distances, Brey-Curtis dissimilarity analysis utilizes bacterial abundance for generating plots. In depth descriptions about α- and β-diversity iare discussed elsewhere³⁰^,⁴⁶^,⁴⁷. A number of statistical methods can be utilized to compare differences in microbial communities between groups¹⁴^,⁴⁸. It is advised to use adjusted p-values instead of raw p-values to correct for multiple testing¹⁴.

There is a variety of bioinformatics software to analyze targeted sequencing data independently⁴⁹. The proposed protocol uses R-based open-source software packages, which allows user-friendly and fast profiling of bacterial taxa through R-based DADA2 pipelines. The abundance table generated from DADA2 can be used for downstream analysis using phyloseq and METAGENassist. DADA2 pipeline is advantageous over QIIME because it does not require special features (i.e., installation of virtual machines or Docker containers), which need relatively large computational resources and special technical expertise. Especially for beginners to the 16S analysis, R is appealing, as it is free and allows users to take advantage of accessible online tutorials and analysis scripts that are easy to execute. Importantly, these analysis tools require relatively small computational resources and can be run on a PC, Macintosh, or Linux-based platform. Additionally, METAGENassist can use abundance tables generated from DADA2 pipelines as well as biological observation matrix (BIOM) files generated from QIIME/MG-RAST to perform analysis such as PCA, partial least squares discriminant analysis (PLS-DA), volcano plots, t-tests (comparing two groups), ANOVA (comparing three or more groups), heat map plots, random forest analysis, etc. METAGENassist was found be very user-friendly.

In summary, this protocol describes a simple 16S rRNA metagenomic profiling pipeline, with a detailed guide on sample collection, DNA extraction, metagenomic library preparation, sequencing on Illumina MiSeq, and user-friendly data analysis using freely available platforms (i.e., DADA2, phyloseq, and METAGENassist). Although 16S rRNA metagenomic-based taxonomic profiling is reliable for characterization of bacteria present in particular biospecimens, shotgun metagenomic sequencing may be a better approach for detailed metabolic pathway analysis and strain-specific bacterial identification.

Subscription Required. Please recommend JoVE to your librarian.

Disclosures

A. M. received royalties from Mayo Clinic (paid by Evelo Biosciences) as one of the inventors of a technology claiming the use of Prevotella histicola for the treatment of autoimmune diseases.

Acknowledgments

The authors acknowledge funding from the NIAID/NIH (1R01AI137075-01), the Carver Trust Medical Research Initiative Grant, and the University of Iowa Environmental Health Sciences Research Center, NIEHS/NIH (P30 ES005605).

Materials

Name	Company	Catalog Number	Comments
1.5 mL Natural Microcentriguge Tube	USA, Scientific	1615-5500	Fecal collection
3M hand applicator squeegee PA1-G	3M, MN, US	7100038651	Squeeger for proper sealing of PCR Plate
Agencourt AMPure XP	Beckman Coulter, IN, USA	A63880	PCR Purification, NGS Clean-up, PCR clean-up
Agilent DNA 1000 REAGENT	Agilent Technologies, CA, USA	5067-1504	DNA quantification and quality control
Bioanalyzer DNA 1000 chip	Agilent Technologies, CA, USA	5067-1504	DNA quantification and quality control
Index Adopter Replacement Caps	Illumina, Inc., CA, USA	15026762	New cap for Index 1 and 2 adopter primer
DNeasy PowerLyzer PowerSoil Kit	MoBio now part of QIAGEN, Valencia, CA, USA	12855-100	DNA isolation
KAPA HiFi HotStart ReadyMiX (2x)	Kapa Biosystem, MA, USA	KK2602	PCR ready mix for Amplicon PCR1 and Indexed PCR2
Lewis Divider Boxes	Lewis Bins, WI, US	ND03080	Fecal collection
Magnetic stand	New England BioLabs, MA, USA	S1509S	For PCR clean-up
MicroAmp Fast Optical 96-Well Reaction Plate	Applied Biosystems, Thermo Fisher Scientific, CA, USA	4346906	PCR Plate
MicroAmp Optical Adhesive Film	Applied Biosystems, Thermo Fisher Scientific, CA, USA	4311971	PCR Plate Sealer
Microfuge 20 Centrifuge	Beckman Coulter, IN, USA	B30154	Centrifuge used for DNA isolation
MiSeq Reagent Kit (600 cycles)v.3	Illumina, Inc., CA, USA	MS-102-3003	For MiSeq Sequencing
Nextera XT DNA Library Preparation Kit	Illumina, Inc., CA, USA	FC-131-1001	16S rRNA DNA Library Preparation
Reagent Reservoirs Multichannel Trays	ASI, FL,USA	RS71-1	For Pooling of PCR2 Product
Plate Cetrifuge	Thermo Fisher Scientific, CA, USA	75004393	For PCR reagent mixing and removing air bubble from Plate
PhiX Control	Illumina, Inc., CA, USA	FC-110-3001	For MiSeq Sequencing control
Microbiome DNA Purification Kit	Thermo Fisher Scientific, CA, USA	A29789	For purification of PCR1 product
PowerLyzer 24 Homogenizer	Omni International, GA, USA	19-001	Bead beater for DNA Isolation
Qubit dsDNA HS (High Sensitivity) assay kit	Thermo Fisher Scientific, CA, USA	Q32854	DNA quantification
TruSeq Index Plate Fixture	Illumina, Inc., CA, USA	FC-130-1005	For Arranging of the index primers
Vertical Dividers (large)	Lewis Bins, WI, US	DV-2280	Fecal collection
Vertical Dividers (small)	Lewis Bins, WI, US	DV-1780	Fecal collection