RNA-Seq analyses are becoming increasingly important for identifying the molecular underpinnings of adaptive traits in non-model organisms. Here, a protocol to identify differentially expressed genes between diapause and non-diapause Aedes albopictus mosquitoes is described, from mosquito rearing, to RNA sequencing and bioinformatics analyses of RNA-Seq data.
Photoperiodic diapause is an important adaptation that allows individuals to escape harsh seasonal environments via a series of physiological changes, most notably developmental arrest and reduced metabolism. Global gene expression profiling via RNA-Seq can provide important insights into the transcriptional mechanisms of photoperiodic diapause. The Asian tiger mosquito, Aedes albopictus, is an outstanding organism for studying the transcriptional bases of diapause due to its ease of rearing, easily induced diapause, and the genomic resources available. This manuscript presents a general experimental workflow for identifying diapause-induced transcriptional differences in A. albopictus. Rearing techniques, conditions necessary to induce diapause and non-diapause development, methods to estimate percent diapause in a population, and RNA extraction and integrity assessment for mosquitoes are documented. A workflow to process RNA-Seq data from Illumina sequencers culminates in a list of differentially expressed genes. The representative results demonstrate that this protocol can be used to effectively identify genes differentially regulated at the transcriptional level in A. albopictus due to photoperiodic differences. With modest adjustments, this workflow can be readily adapted to study the transcriptional bases of diapause or other important life history traits in other mosquitoes.
Rapid advances in next-generation sequencing (NGS) technologies are providing exciting opportunities to probe the molecular underpinnings of a wide range of genetically complex ecological adaptations in a broad diversity of non-model organisms1–3. This approach is extremely powerful because it establishes a basis for population and functional genomics studies of organisms with an especially interesting and/or well-described ecology or evolutionary history, as well as organisms of practical concern, such as agricultural pests and disease vectors. Thus, NGS technologies are leading to rapid advances in the fields of ecology and have the potential to address problems such as understanding the mechanistic bases of biological responses to rapid contemporary climate change4, the spread of invasive species5, and host-pathogen interactions6,7.
The extraordinary potential of NGS technologies for addressing basic and applied questions in ecology and evolutionary biology is in part due to the fact that these approaches can be applied to any organism at a moderate cost that is feasible for most research laboratories. Furthermore, these approaches provide genome-wide information without the requirement of a priori genetic resources such as a microarray chip or complete genome sequence. Nevertheless, to maximize the productivity of NGS experiments requires careful consideration of experimental design including issues such as the developmental timing and tissue-specificity of RNA sampling. Furthermore, the technical skills required to analyze the massive amounts of data produced by these experiments, often up to several hundred million DNA sequence reads, has been a particular challenge and has limited the widespread implementation of NGS approaches.
Recent RNA-Seq studies on the transcriptional bases of diapause in the invasive and medically important mosquito Aedes albopictus provide a useful example of some of the experimental protocols that can be employed to successfully apply NGS technology to studying the molecular basis of a complex ecological adaptation in a non-model organism8–10. A. albopictus is a highly invasive species that is native to Asia but has recently invaded North America, South America, Europe, and Africa11,12. Like many temperate insects, temperate populations of A. albopictus survive through winter by entering a type of dormancy referred to as photoperiodic diapause. In A. albopictus, exposure of pupal and adult females to short (autumnal) day lengths leads to the production of diapause eggs in which embryological development is completed, but the pharate larva inside the chorion of the egg enters a developmental arrest that renders the egg refractory to hatching stimulus15–17. Diapause eggs are more desiccation resistant5,18 and contain more total lipids19 than non-diapause eggs. Photoperiodic diapause in A. albopictus is thus a maternally controlled, adaptive phenotypic plasticity that is essential for surviving the harsh conditions of winter in temperate environments. Despite the well-understood ecological significance of photoperiodic diapause in a wide range of insects20,21, the molecular basis of this crucial adaptation is not well characterized in any insect22. In organisms such as A. albopictus that undergo an embryonic diapause at the pharate larval stage, it remains a particularly compelling challenge to understand how the photoperiodic signal received by the mother is passed to the offspring and persists through the course of embryonic development to cause arrest at the pharate larval stage.
This protocol describes mosquito rearing, experimental design and bioinformatics analyses for NGS experiments (transcriptome sequencing) performed to elucidate transcriptional components of photoperiodic diapause in A. albopictus. This protocol can be used for additional studies of diapause in A. albopictus, can be adapted to investigate diapause in other closely related species such as other aedine mosquitoes that undergo egg diapause23, and is also more generally relevant to employing NGS approaches to study the transcriptional bases of any complex adaptation in any insect.
1. Larval Rearing of Two A. albopictus Groups to Adulthood
2. Maintenance of Adults to Allow Mating and Egg Production
3. Blood-feeding
4. Stimulate Oviposition
5. Collect and Store Eggs
6. Measure Diapause Incidence
7. RNA Extraction from Eggs/pharate Larvae
NOTE: Use Trizol in a laminar flow hood.
8. RNA Sequencing
9. lllumina Read Cleaning
NOTE: Figure 2 summarizes the bioinformatics portion of this protocol. For a full list of all programs and resources used in the bioinformatics section of this protocol, refer to Table 1. In addition, Supplemental File 1 contains command line examples for each of the following bioinformatics protocol steps.
10. Digital Normalization
11. De Novo Transcriptome Assembly
12. Assembly Evaluation
13. Annotation of the Assembled Transcriptome
14. Map Reads to the Assembly Using RSEM34 (Table 1)
15. Differential Expression Analysis
Fluorometry of two representative RNA samples showed two bands at approximately 2,000 nt (Figure 1A, B). The insect 28S ribosomal RNA is comprised of two polynucleotide chains held together by hydrogen bonds, which are easily disrupted by brief heating or agents that disrupt hydrogen bonds35. The resulting two components are approximately the same size as the 18S ribosomal RNA. The second RNA sample showed high levels of degradation (Figure 1B).
Photoperiodic treatment of a representative group of A. albopictus mosquitoes resulted in high diapause incidence in short-day-reared mosquitoes, and low diapause incidence in long-day-reared mosquitoes, although there was some variation among replicates (Table 2). For example, replicate SD2 shows lower (80%) diapause incidence than the remaining replicates (87.18% – 97.67%). This replicate also had the smallest sample size, so it is recommended to set aside a sufficient number of eggs (>150) for the diapause measurement in order to obtain an accurate result.
Post-sequencing read cleaning on one representative library from adult A. albopictus females removed a substantial number of reads (from 83,853,322 to 52,736,065 total reads for one representative library). Digital normalization further reduced the number of total reads to 41,435,934. A Trinity assembly of these reads generated 76,377 contigs, with an N50 of 1,879, mean contig length of 1,023.1, and a maximum contig length of 20,892 (Figure 3). Differential expression analyses from a similar workflow of embryos reared under diapause-inducing conditions at 11 and 21 days post-oviposition revealed 3,128 differentially expressed genes between these two time periods (Figure 4).
Figure 1: Fluorometry profiles of example high-quality (A) and low-quality (B) RNA extractions from A. albopictus. The x-axis represents the sizes of the nucleotide fragments, and the y-axis represents the fluorescent readings. Note the difference in the y-axis scale between panels (A) and (B). Arrows mark the positions of the different ribosomal RNAs. The apparent bands close to the green marker band indicate degradation.
Figure 2: Summary of the bioinformatics workflow from read preparation to differential expression. Each box represents a step in the bioinformatics section of this protocol, accompanied by the corresponding number of each protocol step.
Figure 3: Histogram of contig lengths from a Trinity de novo transcriptome assembly. The average contig length is 1,023.1. Note that the distribution of contig lengths is heavily skewed towards shorter contigs; this is typical of de novo transcriptome assembly.
Figure 4: Log2-fold-change vs. abundance of TMM-normalized gene expression of diapause pharate larvae at 11 days vs. 21 days post-oviposition. Each point designates a unigene; differentially expressed unigenes are in red. Unigenes with higher expression at 11 days post-oviposition have positive fold-change values, whereas unigenes with higher expression 21 days post-oviposition have negative fold-change values.
Program/Resource | Website URL (accessed 1/13/2014) |
perl | http://www.perl.org/get.html |
python | http://www.python.org/download/ |
ssaha2 | http://www.sanger.ac.uk/resources/software/ssaha2/ |
NCBI UniVec Core | ftp://ftp.ncbi.nih.gov/pub/UniVec/UniVec_Core |
SolexaQA | http://solexaqa.sourceforge.net/ |
FastQC | http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ |
khmer | https://github.com/ged-lab/khmer |
Trinity | http://trinityrnaseq.sourceforge.net/ |
Trinity normalization script | http://trinityrnaseq.sourceforge.net/trinity_insilico_normalization.html |
Assemblathon 2 evaluation scripts | https://github.com/ucdavis-bioinformatics/assemblathon2-analysis |
BLAST+(which includes makeblastdb and blastx) | ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ |
linebreak | https://code.google.com/p/linebreak/ |
RSEM | http://deweylab.biostat.wisc.edu/rsem/ |
EdgeR User's Guide | http://www.bioconductor.org/packages/2.13/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf |
R | http://www.r-project.org/ |
EdgeR | http://www.bioconductor.org/packages/2.13/bioc/html/edgeR.html |
Table 1: Programs and resources used for the bioinformatics procedures in this protocol. URLs are listed to easily access each of the resources needed in this protocol.
Treatment | Replicate | No. of eggs from 1st hatch | No. of eggs from 2nd hatch | No. embryonated, unhatched eggs | % Diapause |
SD | 1 | 10 | 0 | 68 | 87.18 |
SD | 2 | 3 | 0 | 12 | 80.00 |
SD | 3 | 1 | 0 | 42 | 97.67 |
SD | 4 | 5 | 0 | 46 | 90.20 |
SD | 5 | 6 | 0 | 79 | 92.94 |
LD | 1 | 79 | 4 | 9 | 9.78 |
LD | 2 | 28 | 0 | 4 | 12.50 |
LD | 3 | 92 | 1 | 6 | 6.06 |
LD | 4 | 30 | 0 | 5 | 14.29 |
LD | 5 | 43 | 0 | 3 | 6.52 |
Table 2: Diapause incidence calculations. Results from five replicates per photoperiod of diapause incidence calculations. Numbers of hatched larvae from two separate hatchings are included, as are the number of un-hatched, embryonated eggs, all of which are necessary to calculate diapause incidence.
This protocol presents methods to discover differentially expressed genes due to photoperiodically induced diapause in A. albopictus. The protocol is significant in that it uniquely combines mosquito rearing and bioinformatics techniques to make all experimental aspects of a molecular physiology program accessible to novice users — in particular for those focusing on the photoperiodic diapause response. Existing methods, to our knowledge, do not provide as much detail in the rearing protocol — which is often necessary to identify rearing mistakes — nor do they provide insight on experimental design during the rearing stage that will enable successful bioinformatic analysis downstream. The methods presented here have been optimized for A. albopictus, especially the rearing methods, which generally take six weeks from one laboratory generation to the next. However, in future applications this method could be adapted with modest adjustments to other mosquito species that exhibit photoperiodic diapause23. Furthermore, the general experimental design and bioinformatics workflow are applicable to the study of other polyphenisms.
Several points not detailed in the protocol should be considered when rearing A. albopictus larvae. First, A. albopictus can be found in a wide variety of natural and artificial container habitats as described in previous papers36, 37. Used tire lots are a common source of larvae for establishing laboratory colonies. Populations collected above 32N latitude in North America can be expected to exhibit a strong diapause response13. The A. albopictus strain used in this protocol was collected from Manassas, VA, and was reared in a laboratory setting for more than eight generations prior to experimental manipulation. Second, lighting in the photoperiod cabinets should be chosen with care. Bulbs in cabinets with built-in lighting functions can cause temperature spikes within the cabinet when the lighting turns on or off. Anecdotal observation suggests these temperature spikes can disrupt the diapause response. To prevent this, built-in lighting functions should be disabled and cabinets should be equipped with a 4-watt cool-fluorescent bulb. Third, larvae are sensitive to H2O quality and food abundance. Therefore, over-feeding may lead to bacterial accumulation and larval mortality. Fourth, there are alternative methods to blood-feed adult female mosquitoes. Glass membranes are an alternative artificial membrane system38, 39, although the HemoTek system performs better in the authors’ experience. Live animals (usually chicken or rodent) can also be used38 — in this case, it is essential to first obtain appropriate certification from your Institutional Animal Care and Use Committee (IACUC). Fifth, although there is no clear published evidence that eggs are photosensitive15, anecdotal observations suggest that eggs from an SD photoperiod treatment exhibit slightly reduced diapause incidence when exposed to an LD photoperiod within 10 days of oviposition. Thus, store both SD and LD eggs under SD conditions to produce a maximal diapause response in the SD eggs and avoid any confounding effect of photoperiod (SD vs. LD) during egg storage.
High RNA quality is essential for generating high quality RNA-Seq data. Abundant care should be taken during the RNA extraction to avoid any nuclease contamination. Low quality RNA samples, such as that shown in Figure 1B, are not appropriate for sequencing. Assessing the RNA quality before sending the samples for sequencing is imperative. Characteristic bands of RNA molecules might be visible for different types of insect tissue used for RNA extraction, such as the four bands smaller than 18S shown in the high quality RNA electropherogram in Figure 1A. Consistent patterns of RNA bands other than the two bands at 18S across samples under distinct biological treatments can strongly indicate that these bands do not result from degradation, but represent biological composition of the RNA molecules in the specific tissue types chosen in the experimental design.
The bioinformatics workflow outlined here allows a user with some command-line and scripting skills to obtain a list of differentially expressed genes from Illumina sequencing data generated from replicated RNA libraries from two contrasting experimental conditions. While this example concerns genes differentially expressed due to photoperiod, this workflow can be applied to any experimental design with two or more treatments, in any organism. There are many other ways to arrive at a list of differentially expressed genes; however, this protocol is likely to be the most straightforward approach for the novice user. More experienced bioinformaticians may want to take extra measures to improve the contiguity and redundancy of their assembly. Biologists with little to no bioinformatics experience may also complete at least part of this pipeline within the iPlant40 Discovery Environment, which is a free graphical-user-interface driven analysis environment. It is likely that iPlant’s functionality will grow larger in the future in order to accommodate full RNA-Seq pipelines from de novo transcriptome assemblies. Finally, note that the excellent User's Guide thoroughly discusses the many ways to use EdgeR41 (Table 1) for differential expression analysis.
In some cases, mis-assemblies can generate chimeric contigs. There are several methods that can help to identify these mis-assemblies, for example, Uchime42. However, from past experience, the number of detected chimeras is exceedingly low (< 0.1%); therefore, employing a chimera detection program may not be worth the extra effort.
Processing high-throughput, next-generation sequencing data requires the ability to 1) store large amounts of data (for a single project, >500 Gb); 2) manipulate large data files that cannot be opened in traditional word processors or spreadsheet programs; 3) perform analyses that require large amounts of RAM, e.g., for de novo assembly; and 4) analyze large datasets, either through programs driven by a command-line interface (which requires the ability to install these programs, which is often non-trivial), or through analysis suites with graphical user interfaces (e.g. Galaxy43 or iPlant40). Researchers with some proficiency in Unix command line and a scripting language will gain the most benefit from access to a local computing cluster – either University-owned, via a collaborator, or purchased for their own laboratory. For example, the above workflow was accomplished using a laboratory-owned Macintosh (12 cores, 64 GB RAM, 1 Tb hard drive), and a University-owned computer cluster for the Trinity assembly. If similar resources are not available, researchers can still turn to iPlant to perform large-scale analyses at no cost, and with relatively lower investment in training due to the graphical interface environment. However, those performing and interpreting the analyses still need to understand the assumptions of each program used.
The authors have nothing to disclose.
This work was supported by the National Institutes of Health grant 5R21AI081041-02 and Georgetown University.
Incubator – Model 818 | Thermo-Scientific | 3751 | 120V |
Controlled environment room | Thermax Scientific | N/A | Walk-in controlled environment room built to custom specifications by Thermax Scientific Products. A larger alternative to an incubator. http://thermmax.com/ |
Cool Fluorescent bulb | Philips | 392183 | 4 Watt |
Petri Dish 100mm x 20mm | Fisher | 08-772-E | |
Filter Paper 20.5cm | Fisher | 09-803-6J | |
9.5L Bucket | Plastican | Bway Products | http://www.bwayproducts.com/sites/portal/plastic-products/plastic-open-head-pails/117 |
Utility Fabric-Mosquito Netting White | Joann | 10173292 | http://www.joann.com/utility-fabric-mosquito-netting-white/10173292.html |
Orthopedic stockings | Albahealth | 23650-040 | product no. 081420 |
Organic Raisins | Newman's Own | UPC: 884284040255 | |
Oviposition cups (brown) | Fisher Scientific | 03-007-52 | The product is actually an amber 125 mL bottle that we saw the top off of. |
Recycled Paper Towels | Seventh Generation | 30BPT120 | |
Modular Mates Square Tupperware Set | Tupperware | http://order.tupperware.com/pls/htprod_www/coe$www.add_items | |
Glass Grinder | Corning Incorporated | 7727-2 | These Tenbroeck tissue grinders break the eggs and release RNA into the TRI Reagent. |
TRI Reagent | Sigma Aldrich | T9424 | Apply 1ml TRI Reagent per 50-100mg of tissue. Caution – this reagent is toxic. |
TURBO DNA-free | Ambion/Life Technologies | AM1907 | This kit generates greater yield than traditional DNase treatment followed by phenol/chloroform cleanup, and it is simpler to use. |
RNaseZap | Ambion/Life Technologies | AM9782 | Apply liberally on the bench surfaces and any equipment that might be in contact with the RNA samples. The solution is slightly alkaline/corrosive, can cause irritation and is harmful when swallowed. |
2100 Bioanalyzer | Agilent Technologies | G2939AA | Place up to 12 RNA samples on one chip. |
Hemotek Membrane Feeder | Hemotek | 5W1 | This system provides 5 feeding stations that can be used simultaneously. Includes PS5 Power Unit and Power cord; 5 FUI Feeders + Meal Reservoirs and O-rings; Plastic Plugs, Hemotek collagen feeding membrane; Temperature setting tool; and Plug extracting tool. The company's mailing address is: Hemotek Ltd; Unit 5 Union Court; Alan Ramsbottom Way; Great Harwood; Lancashire, UK; BB6 7FD; tel: +44 1254 889 307. |
Digital Thermometer and Probe | Hemotek | MT3KFU | MicroT3 thermometer and KFU probe. This is used to set the temperature of each FUI feeding unit. |
Chicken Whole Blood, non-sterile with Sodium Citrate | Pel-Freez Biologicals | 33130-1 | The 500 ml of blood were frozen and stored in 20 ml aliquots at -80 degrees C for up to 1 year. Thaw blood at room temperature for at least 1 h before using. |