Gel-seq enables researchers to simultaneously prepare libraries for both DNA- and RNA-seq at negligible added cost starting from 100 – 1000 cells using a simple hydrogel device. This paper presents a detailed approach for the fabrication of the device as well as the biological protocol to generate paired libraries.
The ability to amplify and sequence either DNA or RNA from small starting samples has only been achieved in the last five years. Unfortunately, the standard protocols for generating genomic or transcriptomic libraries are incompatible and researchers must choose whether to sequence DNA or RNA for a particular sample. Gel-seq solves this problem by enabling researchers to simultaneously prepare libraries for both DNA and RNA starting with 100 – 1000 cells using a simple hydrogel device. This paper presents a detailed approach for the fabrication of the device as well as the biological protocol to generate paired libraries. We designed Gel-seq so that it could be easily implemented by other researchers; many genetics labs already have the necessary equipment to reproduce the Gel-seq device fabrication. Our protocol employs commonly-used kits for both whole-transcript amplification (WTA) and library preparation, which are also likely to be familiar to researchers already versed in generating genomic and transcriptomic libraries. Our approach allows researchers to bring to bear the power of both DNA and RNA sequencing on a single sample without splitting and with negligible added cost.
Next generation sequencing (NGS) has had a profound impact on the way genetics research is conducted. Where researchers once focused on sequencing the genome of an entire species, it is now possible to sequence the genome of a single tumor or even a single cell in one experiment.1 NGS has also made it cost effective to sequence the RNA transcripts found within a cell, a collection of data known as the transcriptome. The ability to amplify and sequence either DNA or RNA from small starting samples has only been achieved in the last five years.2,3,4 Unfortunately, standard protocols are incompatible and researchers must choose whether to sequence DNA or RNA for a given sample. When a starting sample is large enough, it can be split in half. At smaller scales, however, loss of material due to splitting samples can affect library quality, and pooling of samples can average out interesting variations between cells.5 Furthermore, researchers are increasingly interested in examining samples that cannot be split, such as single cells or small heterogeneous tumor biopsies.6
To address this problem, three protocols have recently been developed to sequence both DNA and RNA from the same starting sample: Gel-seq7, G&T-seq8, and DR-seq9. This article presents a detailed protocol for Gel-seq, which can be used to simultaneously generate DNA and RNA libraries from as few as 100 cells at negligible added cost. The novel aspect of Gel-seq is the ability to separate DNA and RNA based exclusively on size using low cost hydrogel matrices. The core innovation of the Gel-Seq protocol is the physical separation of DNA from RNA. This separation is achieved electrophoretically using a combination of polyacrylamide membranes that take advantage of the size differences between these molecules. To put these size differences in context, consider how DNA and RNA are imaged: while DNA exists on the micron-scale and can be viewed using traditional microscopes, RNA exists on the nanometer scale and must be imaged using complex techniques such as cryo-electron microscopy.10
The approach to separating DNA and RNA in this protocol is shown in Figure 1. The left panel shows DNA and RNA free floating in solution near a membrane. When an electric field is applied, as shown in the right panel, DNA and RNA experience an electrophoretic force that induces migration through the membrane. By tuning the membrane properties, we have created a semi-permeable membrane that separates DNA from RNA. The DNA molecules are pushed against the membrane, but become entangled at the edge because of their large size. Small RNA molecules, on the other hand, can reconfigure and weave their way through the membrane. This process, known as reptation, is similar to the way a snake moves through grass. Eventually these RNA molecules are stopped by a second, high-density membrane that is too difficult for even smaller polymers (>200 base pairs) to wriggle through. Once physically separated, DNA and RNA can be recovered and processed to generate information about both the genome and transcriptome. While we can separate DNA and RNA, we have found better results are obtained if the RNA is reverse transcribed to cDNA before separation. The cDNA/RNA hybrids are more stable than RNA alone and can still pass through the low-density membrane.
Figure 1. Gel-seq Operating Principle. The underlying principle used to physically separate DNA and RNA. In an applied electric field, small RNA molecules migrate through the low-density membrane but large DNA molecules are trapped at the surface. This figure was reproduced from Ref. 7 with permission from the Royal Society of Chemistry. Please click here to view a larger version of this figure.
This paper describes in detail both the fabrication of the Gel-seq device and the biological protocol to generate paired DNA and RNA libraries. An overview of both is shown in Figure 2. The device is fabricated by layering three different density polyacrylamide gels on top of each other in a process similar to creating standard stacking gels.11 The biological protocol starts with 100 – 1000 cells suspended in PBS. The cells are lysed and the RNA is converted into cDNA before the device is used to separate the genomic DNA from the cDNA/RNA hybrids. After separation and recovery, genomic and transcriptomic libraries are prepared using a process that closely follows the standard whole-genome library preparation kit protocol. Further detail about the development and validation of Gel-seq can be read in the Lab on a Chip publication "Gel-seq: whole-genome and transcriptome sequencing by simultaneous low-input DNA and RNA library preparation using semi-permeable hydrogel barriers."7
Figure 2. Gel-seq Protocol. An overview of the steps to fabricate the Gel-seq device and the protocol to generated paired DNA and RNA libraries. Portions of this figure were reproduced from Ref. 7 with permission from the Royal Society of Chemistry. Please click here to view a larger version of this figure.
To generate DNA and RNA libraries from single cells, researchers should consider using either G&T-seq or DR-seq. G&T-Seq, like Gel-seq, relies on a physical separation of RNA from genomic DNA. This approach relies on messenger RNA's (mRNA) 3′ polyadenylated tail as a pull-down target. The mRNA is captured on a magnetic bead using a biotinylated oligo-dT primer. Once the mRNA has been captured the beads are held in place with a magnet and the supernatant containing the genomic DNA can be removed and transferred to another tube. After this physical separation is complete, separate libraries can be generated from the mRNA and DNA.8 This approach works well if the RNA of interest is polyadenylated, however it cannot be used to study non-polyadenylated transcripts, such as ribosomal RNA, tRNA, or RNA from prokaryotes.
DR-seq relies on a pre-amplification step where both DNA and cDNA derived from RNA are amplified in the same tube. The sample is then split in two and processed in parallel to prepare DNA- and RNA-seq libraries. To distinguish between genomic DNA and the cDNA derived from RNA, DR-seq takes a computational approach. Sequences where only exons are present are computationally suppressed in the genomic DNA data, as those could have originated from either DNA or RNA.9 An advantage of this approach is that the DNA and cDNA/RNA need not be physically separated as is done in Gel-seq and G&T-seq. The drawback, however, is that DR-seq requires a priori knowledge of the genome and transcriptome (i.e., exons versus introns), and might not be ideal for applications such as sequencing of nuclei, in which many transcripts are not yet fully spliced and still contain introns.12
The novel aspect of Gel-seq is the ability to separate DNA and RNA in hundreds of cells based exclusively on size. This method requires no a priori knowledge of the genome or transcriptome, is robust against incomplete splicing, and is not limited to poly-adenylated transcripts. For applications where a researcher can start with at least 100 cells, Gel-seq provides a straightforward approach using cheap and widely-available materials.
1. Chemical Solution Preparation
NOTE: The following steps are for preparing chemical solutions required in later steps. These can be made in bulk and stored for several months.
2. Gel-seq Cassette Fabrication
Note: Gel-seq was originally developed with upright cassettes (see Table of Materials for more information); however, this protocol can be adapted to work with any standard gel electrophoresis cassette.
Filler Gel Precursor | High Density Gel Precursor | Low Density Gel Precursor | |||
40%T, 3.3%C Acrylamide Bisacrylamide Solution | 1.6 mL | 50%T, 5%C Acrylamide Bisacrylamide Solution | 2.4 mL | 40%T, 3.3%C Acrylamide Bisacrylamide Solution | 0.6 mL |
Deionized Water | 10.2 mL | Deionized Water | 1.0 mL | Deionized Water | 4.8 mL |
Sucrose Solution (50% w/v) | 2.6 mL | Sucrose Solution (50% w/v) | 0.6 mL | ||
10X Tris-Borate-EDTA | 1.6 mL | 10X Tris-Borate-EDTA | 0.6 mL | ||
Ammonium persulfate (10% w/v) | 104.0 µL | Ammonium persulfate (10% w/v) | 50.0 µL | Ammonium persulfate (10% w/v) | 39.0 µL |
TEMED | 6.0 µL | TEMED | 1.0 µL | TEMED | 2.2 µL |
Total Volume | 16.1 mL | Total Volume | 4.1 mL | Total Volume | 6.0 mL |
Table 1. Gel Synthesis Reagents. Polyacrylamide gel precursor reagents sufficient for fabrication of 2 cassettes.
3. Sample Preparation and Reverse Transcription
4. Gel Separation and Sample Recovery
5. gDNA Library Preparation
6. cDNA Library Preparation
7. Library Preparation with Half Volume Reactions
8. Solid Phase Reversible Immobilization Bead Library Cleaning
The physical separation of gDNA and cDNA/RNA hybrids in the Gel-seq device can be visualized through fluorescent gel imaging; a representative result is shown in Figure 3. Panel A shows the fabricated Gel-seq device; false color has been added to distinguish the different gel regions. Panel B shows a close up of four different separations used for validation. The third lane, a negative control, represents background and shows that there is no autoflourescence of the gel at the interfaces. We loaded the first and second lanes with DNA ladders. These lanes show only a dark band at the interface between the low- and high-density membranes, revealing that small fragments can pass through the low-density gel. The fourth lane shows the behavior of a biological sample of interest: 500 PC3 cells. We loaded lane four as described in step 3 of the protocol. The image shows separation of genomic DNA and cDNA/RNA hybrids. A dark band at the top of the low-density membrane is megabase-scale genomic DNA while the cDNA/RNA hybrids are stacked at the interface of the low and high-density regions. Unlike the lanes loaded with ladder, there are also several bands present within the high-density region of the gel. These fragments, smaller than 100 bp, are off-target products generated from primer oligonucleotides during reverse transcription. Panel C shows a representative image of the entire Gel-seq device from a successful experiment. Lanes labeled RNA/cDNA were processed with the Gel-seq protocol, while lanes labeled RNA show a separation of just gDNA and RNA. Panel D shows a failed experiment with black bands at the top of every lane of the low-density membrane. This was caused by electrophoresis buffer contaminated with fragmented DNA. At this step, researchers should be looking for both clean negative controls and two distinct black bands indicating the presence of separated gDNA and RNA/cDNA hybrids.
Figure 3. Gel-Seq Separation Results. The Gel-seq device (A) and a fluorescent image showing separation of DNA and RNA/cDNA hybrids (B). False color has been added to more easily distinguish between the different regions of density within the gel. (C and D) Representative fluorescent images of the entire Gel-seq device from a successful (C) and failed (D) experiment. NTC = No Template Control. Portions of this figure were reproduced from Ref. 7 with permission from the Royal Society of Chemistry. Please click here to view a larger version of this figure.
Once the DNA and RNA/cDNA hybrids have been separated and the remainder of the Gel-seq protocol completed, it is possible to generate sequencing libraries. To validate the prepared libraries, we run either a standard gel electrophoresis experiment (Figure 4) or a bioanalyzer. The results in Figure 4 show libraries generated from 500 PC3 cells and 750 HeLa cells. The figure shows the fragment distributions for matched libraries generated from Gel-seq (labeled 'Gel') compared to unmatched samples generated with standard protocols (labeled 'Tube'). The fragment sizes for Gel-seq appear between 200 and 800 basepairs as expected when preparing libraries using the standard whole-genome library preparation kit. If the library fragments do not appear in the correct size range in this step, library preparation has failed.
Figure 4. Library Fragment Size Comparison. A fluorescent gel electrophoresis image comparing library size distribution between Gel-seq (Gel) and standard controls (Tube). The left lane contains a low mass DNA ladder with fragment sizes 100, 200, 400, 800, 1200, and 2000 basepairs. The fragment sizes for all libraries fall between 200 and 800 basepairs, as expected for libraries prepared with this kit. Please click here to view a larger version of this figure.
The ultimate validation of the Gel-seq protocol is based on the analysis of sequencing results. We selected PC3 cells for our validation experiments, as these cells have homogenous expression profiles that allow samples to be split and processed using both Gel-seq and traditional methods; see Figure 5. A comparison between genomic DNA for PC3 cells is shown in Figures 5A and 5B. Figure 5A shows a comparison of genome-wide copy number variation (CNV) profiles generated from PC3 using either Gel-seq or a standard whole-genome library prep reaction (tube control). Each point is a mean normalized bin count; bins are defined from reference genome data such that each bin has equal expected count in a healthy diploid cell, i.e., a flat line, representing equal copies for each region of all autosomal (excluding X and Y) chromosomes. PC3 contains multiple copies of the same regions which show up as spikes above a background copy number of two. Gel-seq yields a qualitatively similar CNV profile as standard tube reaction. Agreement between the two plots can be assessed quantitatively by linear regression, as shown in panel B. A Pearson correlation of R = 0.90 indicates that genomic data gathered from either method is functionally equivalent.
Figure 5. Library Validation. Gel-seq validation for the genomic (A and B) and transcriptomic (C, D, and E) data generated from PC3 and Hela cells. Panel A shows a comparison of genome-wide CNV profiles generated from PC3 using either Gel-seq (Gel-seq) or a standard reaction (tube). MAPD = Median Absolute Pairwise Difference. Panel B is a linear regression between the two samples in Panel A, with R = 0.90 indicating the genomic data are functionally equivalent. The axes show the log2 normalized bin counts. Panels C and D compare transcriptomic data from PC3 cells, with each point showing a count in transcripts per kilobase per million (TPM). The axes show the comparison between two samples as log2 normalized transcript counts. Panel C shows a comparison of technical replicates generated using Gel-seq, and Panel D shows a comparison between Gel-seq and traditional RNA-seq. Panel E shows that Gel-seq can resolve cell type based on RNA expression using a principal component analysis. The x axis shows that the first principal component accounts for 91.6% of the variation between the samples while the y axis shows that the second principal component accounts for only 6.5% of the variance. Portions of this figure were reproduced from Ref. 7 with permission from the Royal Society of Chemistry. Please click here to view a larger version of this figure.
Similarly, we compared transcriptomic data from our Gel-seq protocol to the standard in-tube Smart-Seq protocol. Figure 5 shows the correlation between both Gel-seq technical replicates (Figure 5C) and between Gel-seq and the standard method (Figure 5D). Each point is a count in transcripts per kilobase per million (TPM) for each gene detected at TPM > 5 in both datasets. The linear regressions are shown as red lines, and the Pearson correlation coefficient is shown in the upper left corner. Technical replicates from Gel-seq agree (R ∼ 0.8), but correlate less well with the standard method (R < 0.7). This suggests that Gel-seq introduces a bias in gene counts. Fortunately, this bias is systematic and, as can be seen by the principal component analysis in Figure 5E, meaningful conclusions can still be drawn between different biological samples.
There are several critical steps associated with the Gel-seq device fabrication as well as the protocol itself. During fabrication, we recommend starting with the prescribed layer thicknesses for the various regions of the gel. We spent significant time testing different fabrication options and the protocol described here produces the best devices for the cassettes listed in the Table of Materials and Reagents. If researchers use an alternative cassette system, they may find it necessary to tweak the volumes used when creating the devices. The major challenge in fabrication is that if the high-density gel region is too large it can delaminate from the edges of the cassette and create air pockets on the interior of the cassette that will disrupt electrophoresis. By casting several cassettes with several different layer volumes, researchers should be able to quickly determine the optimal configuration for their specific hardware.
The Gel-seq protocol also has several critical steps that can be validated before the protocol is complete. One potential failure point is the separation of gDNA and RNA/cDNA hybrids. This can be validated by imaging the Gel-seq device after separation (see Figure 3B). In one set of experiments, we found that our lab supply of buffer had become contaminated with DNA and was causing substantial autoflourescence in our device (see Figure 3D). This made it difficult to determine if the separation had taken place. Fluorescent imaging helped us identify and correct this problem before using any costly reagents to generate sequencing libraries.
Another critical point is step 6.2, the qPCR amplification of cDNA after separation. Researchers should pay careful attention not to overamplify in this step as it will reduce the quality of the RNA-seq data. This consideration is not unique to Gel-seq, but is a common aspect of low-input RNA-seq library preparation. PCR amplification during sequencing library preparation is often necessary, but it can introduce sequence errors and biases. The required number of cycles for PCR depends on sample quantity and complexity. It is generally advisable to limit PCR cycle number to the bare minimum required to yield sufficient clustering when the libraries are sequenced. In theory, a protocol can be optimized to determine the exact cycle number that yields sufficient copy number without introducing excessive artifacts. In practice, however, inconsistencies in sample quality, loading, or handling early in the protocol can dramatically affect the distribution of molecular templates available for library prep PCR, which in turn affects the optimal PCR cycle number. The most general solution we have found to monitor the progress of the amplification reactions using a fluorescent dye, run the reactions on a real-time PCR thermocycler, and stop the reactions in the exponential (linear versus cycle number) phase. In our experience, real-time monitoring is especially relevant when developing, adapting, or adopting a new protocol.
The last critical step is generating genomic and transcriptomic libraries. The key to this step is to set the starting sample concentration for the DNA library prep reaction as close to 0.2 ng/µL (0.5 ng total) as possible. This is relatively straightforward for the qPCR amplified cDNA as there is usually an excess of cDNA, but it can be more challenging for the gDNA samples. We found careful attention to the vacufuge step was required while the samples were being concentrated. As expected, in experiments with 1000 cells, the vacufuge step could be stopped much sooner than experiments with 100 cells. The number of samples in the vacufuge also impacted the evaporation rate in our experiments. We found that using a flourometer to validate DNA content midway through the concentration step could be helpful when performing the protocol with unfamiliar samples. Fortunately, if researchers over concentrate a sample, nuclease free water can be added to dilute the sample. Theoretically, it is possible to use the vacufuge to dry the DNA and then resuspend it in the desired volume; however, we suggest avoiding complete evaporation.
We view the three current methods for generating simultaneous DNA and RNA libraries, Gel-seq7, G&T-seq8, and DR-seq9, as complimentary. Gel-seq is ideal for samples in the 100 – 1000 cell range and requires no pull-down targets or a priori knowledge of the genome. The other two methods are better suited for single cell applications. One of our goals in developing Gel-seq was to create a protocol that could be easily implemented by other researchers. We therefore decided to fabricate devices within the standard form factor of a polyacrylamide gel cassette. While the technique we used to define our different membranes is novel, most genetics labs already have all the necessary equipment to fabricate the Gel-seq device. Furthermore, the cost of the device is trivial – just $5.25 for a device that can process 12 samples. That said, as with any library preparation protocol using commercial reagents, the overall cost for generating libraries remains high. Our reagent cost per sample was $50 for whole-transcript amplification and $28 for library preparation for both DNA and RNA. Fortunately, the Gel-seq device itself is protocol agnostic. For example, during development we successfully tested the device using cells from culture and an older RNA library amplification protocol19, although we found it was not suitable for tissue samples from mice. Looking towards the future, as cheaper alternatives for library preparation are developed, our protocol can be adapted to work with these new techniques. We believe researchers will find it straightforward to implement Gel-seq in their own labs. We hope this will facilitate the rapid adoption of the technology.
The authors have nothing to disclose.
Funding for this work was provided by the University of San Diego, the National Science Foundation Graduate Research Fellowship Program, NIH grant R01-HG007836, and by the Korean Ministry of Science, ICT and Future Planning.
Earlier versions of a several figures were first published in “Hoople, G. D. et al. Gel-seq: whole-genome and transcriptome sequencing by simultaneous low-input DNA and RNA library preparation using semi-permeable hydrogel barriers. Lab on a Chip 17, 2619-2630, doi:10.1039/c7lc00430c (2017).” Lab on a Chip has sanctioned the reuse of figures in this publication.
Reagents | |||
Acrylamide Monomer | Sigma Aldrich | A8887-100G | |
Ammonium Persulfate | Sigma Aldrich | A3678-25G | |
Ampure XP Beads | Beckman Coulter | A63880 | Referred to in the text as solid phase reversible immobilization (SPRI) beads |
DNA Gel Loading Dye (6x) | ThermoFisher Scientific | R0611 | Referred to in the text as 6X loading dye |
Ethyl alcohol | Sigma Aldrich | E7023-500ML | |
KAPA SYBR FAST One-Step qRT-PCR Kits | Kapa BioSystems | 7959613001 | Referred to in the text as 2X qPCR mix |
N,N′-Methylenebis(acrylamide) | Sigma Aldrich | 146072-100G | Also known as bis-acrylamide |
NexteraXT DNA Library Preparation Kit (referred to in the text as library preparation kit) | Illumina | FC-131-1024 | Includes: TD (referred to in the text as transposase buffer), ATM (referred to in the text as transposase), NT (referred to in the text as transposase stop buffer), and NPM (Referred to in the text as library prep PCR mix) |
Nuclease Free Water | Millipore | 3098 | |
Protease | Qiagen | 19155 | |
SMART-Seq v4 Kit (referred to in the text as whole-transcript amplification (WTA) kit) | Takara/Clontech | 634888 | Includes: Lysis buffer, RNase inhibitor, 3’ SMART-Seq CDS Primer II A (referred to in the text as RT primer), 5X Ultra Low First Strand Buffer (referred to in the text as first strand buffer), SMART-Seq v4 Oligonucleotide (referred to in the text as template switch oligonucleotide (TSO)), SMART-Scribe Reverse Transcriptase (referred to in the text as reverse transcriptase), and PCR Primer II A (referred to in the text as cDNA PCR primer) |
Random hexamer with WTA adapter | IDT | n/a | 5′-AAGCAGTGGTATCAACGCAGAGTAC-NNNNNN-3′ |
Sucrose | Sigma Aldrich | S0389-500G | |
TEMED | Sigma Aldrich | T9281-25ML | |
DNA AWAY Surface Decontaminant | ThermoFisher Scientific | 7010PK | Referred to in the text as DNA removal product |
Tris-Borate-EDTA buffer (10X concentration) | Sigma Aldrich | T4415-1L | |
SYBR Gold Nucleic Acid Gel Stain (10,000X Concentrate in DMSO) | ThermoFisher Scientific | S11494 | Referred to in the text as gel stain |
Equipment | |||
BD Precisionglide syringe needles, gauge 18 | Sigma Aldrich | Z192554 | Any equivalent hardware is acceptable |
Branson CPX series ultrasonic bath | Sigma Aldrich | Z769363 | Any equivalent hardware is acceptable |
Empty Gel Cassettes, mini, 1.0 mm | ThermoFisher Scientific | NC2010 | Any equivalent hardware is acceptable |
Mesh Filter Plate – Corning HTS Transwell 96 well permeable supports – 8.0 µm pore size | Sigma Aldrich | CLS3374 | Referred to in the text as 8 um mesh filter plate |
PowerPac HC Power Supply | Bio-Rad | 1645052 | Any equivalent hardware is acceptable |
Qubit Fluorometer | ThermoFisher Scientific | Q33216 | Any equivalent hardware is acceptable |
Vacufuge Concentrator | Eppendorf | 22822993 | Any equivalent hardware is acceptable |
XCell SureLock Mini-Cell system | ThermoFisher Scientific | EI0001 | Any equivalent hardware is acceptable |
Bio-Rad CFX96 Touch Real-Time PCR Detection System | Bio-Rad | 1855195 | Any equivalent hardware is acceptable |
Amersham UVC 500 Ultraviolet Crosslinker | GE Healthcare Life Sciences | UVC500-115V | Discontinued, any equivalent hardware is acceptable |
Gel Doc XR+ Gel Documentation System | Bio-Rad | 1708195 | Referred to in the text as gel imager |
Dark Reader Transilluminator | Clare Chemical Research | DR89 | Referred to in the text as UV transilluminator |
Ultrasonic Bath | Bransonic | 1207K35 | Any equivalent ultrasonic bath is acceptable. |